Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2

Similar documents
What is Data Science. Data Science: Jordan Boyd-Graber University of Maryland DECEMBER 29, 2017

Definitions: AI, ML, DS

Interacting with Data

Domestication of the dog and cat

What is taxonomy? Taxonomy is the grouping and naming of organisms. Biologists who study this are called taxonomists

Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

First printing: July 2016

Classification and Taxonomy

GY 112: Earth History. Fossils 3: Taxonomy

Species: Panthera pardus Genus: Panthera Family: Felidae Order: Carnivora Class: Mammalia Phylum: Chordata

CATS in ART. Desmond Morris

Canine Communication Discusses how dogs communicate with people and with each other through body language and vocalizations.

1 Sorting It All Out. Say It

WHAT BREEDS MAKE UP MIDNIGHT 3?

May 17, SWBAT explain why scientists classify organisms SWBAT list major levels of hierarchy

Fig Phylogeny & Systematics

Cladistics (reading and making of cladograms)

Indigo Sapphire Bear. Newfoundland. Indigo Sapphire Bear. January. Dog's name: DR. NEALE FRETWELL. R&D Director

Perfect Pet. The. by Samantha Bell. Samantha Bell

Title: Phylogenetic Methods and Vertebrate Phylogeny

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

What is Classification?

Activity 3, Humans Effects on Biodiversity. from the Evolution Unit of the SEPUP course. Science in Global Issues

VETERINARY CLINICAL SCIENCES

Ch. 17: Classification

Learning Goals: 1. I can list the traditional classification hierarchy in order.

Classification. Grouping & Identifying Living Things

Assignment Design a chart detailing different breeds, and if possible, showing lineage, as to how they were bred.

TOWNSHIP OF WATERFORD COUNTY OF CAMDEN STATE OF NEW JERSEY

Modern Evolutionary Classification. Lesson Overview. Lesson Overview Modern Evolutionary Classification

Multiclass and Multi-label Classification

EOQ 3 Exam Review. Genetics: 1. What is a phenotype? 2. What is a genotype?

puppy and kitten mill dogs and cats in pet shops; and

National Academic Reference Standards (NARS) Veterinary Medicine. February st Edition

Credits 4 Introduction 5 CHAPTER 1: DOGS AND HUMANS 6

VETERINARY CLINICAL SCIENCES (V C S)

TOWNSHIP OF MANALAPAN ORDINANCE NO

Companion Animal Science (Biology & Technology)

Please initial and date as your child has completely mastered reading each column.

Why should we care about biodiversity? Why does it matter?

Mexican Gray Wolf Reintroduction

Dogs of the World. By Camden Mumford

Semantics. These slides were produced by Hadas Kotek.

The trusted, student-friendly online reference tool. Name: Date: Cats

2019 ALL ABOUT DOGS INTERVIEW QUESTIONS YOUTH AGES 8-10 YEARS OLD

Analysis of Veterinary Workforce in Thailand National Veterinary Education sub committee Gr.1

Marsupial Mole. Notoryctes species. Amy Mutton Zoologist Species and Communities Branch Science and Conservation Division

Cat Math A math lesson on pet overpopulation

Lecture 11 Wednesday, September 19, 2012

Dogs and More Dogs PROGRAM OVERVIEW

CAT MATH AN INTERMEDIATE LEVEL MATH LESSON ON CAT OVERPOPULATION

Classification. Chapter 17. Classification. Classification. Classification

Lab 8 Order Carnivora: Families Canidae, Felidae, and Ursidae Need to know Terms: carnassials, digitigrade, reproductive suppression, Jacobson s organ

Reiki Healing for Cats

Darwin and the Family Tree of Animals

AN INTRODUCTION TO VETERINARY PHARMACOLOGY

Introduction to phylogenetic trees and tree-thinking Copyright 2005, D. A. Baum (Free use for non-commercial educational pruposes)

Required and Recommended Supporting Information for IUCN Red List Assessments

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

MeSH. Objectives. What are these? Agenda Lecture Break (reconvene Lab 225) Class Exercise

Veterinary Science. Rabbit Unit Handouts

CUYAHOGA COUNTY DISTRICT BOARD OF HEALTH RABIES CONTROL REGULATION

MSc in Veterinary Education

PRESSING ISSUES ACTION PLAN. Completed by Pressing Issues Working Group for the Idaho Bird Conservation Partnership September 2013

Building Concepts: Mean as Fair Share

Handbook Murdoch University. Coursecode BACHELOR OF SCIENCE/DOCTOR OF VETERINARY MEDICINE. Correct as at: 2 September 2018 at 4:31am

Population Dynamics: Predator/Prey Teacher Version

Course # Course Name Credits

Emotional Support Animal Policy. Academic Support Services Center for Campus Life

Welcome! Your interest in the veterinary technology program at ACC is greatly appreciated. AS a recently AVMA accredited program there are many

2015 World Agility Open Championships Team USA

Bio 1B Lecture Outline (please print and bring along) Fall, 2006

Essential Question: How do biologists classify organisms?

Procedures for Assistance Animal in Residential Facilities

IMPORTANT NOTE: THIS IS ONLY AN APPLICATION! Filling out this application does not guarantee you will be approved to adopt a pet.

European Convention for the Protection of Vertebrate Animals used for Experimental and Other Scientific Purposes *

English One Name Reading Test 2 (20 points) Man s Best Friend Just Got Better By Darwin Wigget, The Guardian, March 14, 2016

2009 Teacher Created Resources, Inc.

No limbs Eastern glass lizard. Monitor lizard. Iguanas. ANCESTRAL LIZARD (with limbs) Snakes. No limbs. Geckos Pearson Education, Inc.

Dogs and More Dogs PROGRAM OVERVIEW

AKC Delegate Report Michael Zarlenga, AKC Delegate

XII. LEGISLATIVE POLICY STATEMENTS

FABULOUS FELINES LESSON 4: Grades 4-6 LEARNING TO SPEAK CAT

2017 ANIMAL SHELTER STATISTICS

Caring for people caring for animals since 1980

The Kaggle Competitions: An Introduction to CAMCOS Fall 2015

CASE STUDIES. Trap-Neuter-Return Effectively Stabilizes and Reduces Feral Cat Populations

Myths about the Mayflower

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Classifying Organisms. Classifying Organisms. Classifying Organisms. Classifying Organisms.

Figure 4.4. Opposite page: The red fox (Vulpes vulpes) can climb trees. (Foto: F. Labhardt)

Call of the Wild. Investigating Predator/Prey Relationships

GOOD GOVERNANCE OF VETERINARY SERVICES AND THE OIE PVS PATHWAY

1 This question is about the evolution, genetics, behaviour and physiology of cats.

U N D E R S TA N D I N G O U R C A N I N E C O M PA N I O N S ( ADVANCED DIPLOMA ) DISTANCE LEARNING

SERVICE ANIMAL & EMOTIONAL SUPPORT ANIMAL ACCOMMODATION POLICY

The Essentials of Writing an Effective Essay/Written Response

You have 254 Neanderthal variants.

INDEX ACTH, 27, 41 adoption of cats, 76, 135, 137, 150 adrenocorticotropic hormone. See ACTH affiliative behaviours, 2, 5, 7, 18, 66 African wild cat,

1.14 Infanticide by a male lion. Bad fathers in wild life

Transcription:

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from Noah Smith Machine Learning: Chenhao Tan Boulder of 24

Administrivia Make sure that you enroll in Moodle and have access to Piazza Email me to introduce yourself, one of your core values, and a machine learning application you care about Machine Learning: Chenhao Tan Boulder 2 of 24

Learning Objectives Understand the difference between memorization and generalization Understand feature extraction Understand the basics of decision tree Machine Learning: Chenhao Tan Boulder 3 of 24

Outline Memorization vs. Generalization Features Decision tree Machine Learning: Chenhao Tan Boulder 4 of 24

Memorization vs. Generalization Outline Memorization vs. Generalization Features Decision tree Machine Learning: Chenhao Tan Boulder 5 of 24

Memorization vs. Generalization Memorization vs. Generalization What do you think are the differences? Machine Learning: Chenhao Tan Boulder 6 of 24

Memorization vs. Generalization Memorization vs. Generalization Task: Given a dataset that contains transcripts at CU, predict whether a student is going to take CSCI 4622 Machine Learning: Chenhao Tan Boulder 7 of 24

Memorization vs. Generalization Memorization vs. Generalization Task: Given a dataset that contains transcripts at CU, predict whether a student is going to take CSCI 4622 whether Taylor is going to take this class? Machine Learning: Chenhao Tan Boulder 7 of 24

Memorization vs. Generalization Memorization vs. Generalization Task: Given a dataset that contains transcripts at CU, predict whether a student is going to take CSCI 4622 whether Taylor is going to take this class? whether Bill Gates is going to take this class? Machine Learning: Chenhao Tan Boulder 7 of 24

Memorization vs. Generalization Memorization vs. Generalization training data test set Machine Learning: Chenhao Tan Boulder 8 of 24

Memorization vs. Generalization Memorization vs. Generalization training data test set Formal definition in the next lecture Machine Learning: Chenhao Tan Boulder 8 of 24

Features Outline Memorization vs. Generalization Features Decision tree Machine Learning: Chenhao Tan Boulder 9 of 24

Features Features Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan Boulder of 24

Features Features Let φ be a function that maps from inputs (x) to values. Machine Learning: Chenhao Tan Boulder of 24

Features Features Let φ be a function that maps from inputs (x) to values. If φ maps to {, }, we call it a binary feature (function). Machine Learning: Chenhao Tan Boulder of 24

Features Features Let φ be a function that maps from inputs (x) to values. If φ maps to {, }, we call it a binary feature (function). If φ maps to R, we call it a real-valued feature (function). Machine Learning: Chenhao Tan Boulder of 24

Features Features Let φ be a function that maps from inputs (x) to values. If φ maps to {, }, we call it a binary feature (function). If φ maps to R, we call it a real-valued feature (function). Feature functions can map to categorical values, ordinal values, integers, and more. Machine Learning: Chenhao Tan Boulder of 24

Features Features Let us have an interactive example to think through data representation! Machine Learning: Chenhao Tan Boulder 2 of 24

Features Features Let us have an interactive example to think through data representation! Auto insurance quotes id rent income urban state car value car year yes 5, no CO 2, 2 2 yes 7, no CO 3, 22 3 no 25, yes CO 55, 27 4 yes 2, yes NY 5, 26 Machine Learning: Chenhao Tan Boulder 2 of 24

Akaike information criterion From Wikipedia, the free encyclopedia Akaike's information criterion, developed by Hirotsugu Akaike under the name of "an information criterion" (AIC) in 97 and proposed in Akaike (974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of entropy. The AIC is an operational way of trading off the complexity of an estimated model against how well the model fits the data. Contents Definition 2 AICc and AICu 3 QAIC 4 References 5 See also 6 External links Definition In the general case, the AIC is where k is the number of parameters in the statistical model, and L is the likelihood function. Over the remainder of this entry, it will be assumed that the model errors are normally and independently distributed. Let n be the number of observations and RSS be the residual sum of squares. Then AIC becomes Increasing the number of free parameters to be estimated improves the goodness of fit, regardless of the number of free parameters in the data generating process. Hence AIC not only rewards goodness of fit, bu also includes a penalty that is an increasing function of the number of estimated parameters. This penalty discourages overfitting. The preferred model is the one with the lowest AIC value. The AIC methodology attempts to find the model that best explains the data with a minimum of free parameters. By contrast, more traditional approaches to modeling start from a null hypothesis. The AIC penalizes free parameters less Cat From Wikipedia, the free encyclopedia The Cat (Felis silvestris catus), also known as the Domestic Cat or House Cat to distinguish it from other felines, is a small carnivorous species of crepuscular mammal that is often valued by humans for its companionship and its ability to hunt vermin. It has been associated with humans for at least 9,5 years. [3] A skilled predator, the cat is known to hunt over, species for food. It is intelligent and can be trained to obey simple commands. Individual cats have also been known to learn to manipulate simple mechanisms, such as doorknobs. Cats use a variety of vocalizations and types of body language for communication, including meowing, purring, hissing, growling, squeaking, chirping, clicking, and grunting. [4] Cats are popular pets and are also bred and shown as registered pedigree pets. This hobby is known as the "Cat Fancy". Until recently the cat was commonly believed to have been domesticated in ancient Egypt, where it was a cult animal. [5] But a study by the National Cancer Institute published in the journal Science says that all house cats are descended from a group of self-domesticating desert wildcats Felis silvestris lybica circa, years ago, in the Near East. All wildcat subspecies can interbreed, but domestic cats are all genetically contained within F. s. lybica.[6] Physiology Cat [] other images of cats Conservation status Domesticated Scientific classification Kingdom: Animalia Phylum: Chordata Class: Order: Family: Genus: Species: Mammalia Carnivora Felidae Felis F. silvestris Subspecies: F. s. catus Trinomial name Felis silvestris catus (Linnaeus, 758) Synonyms Felis lybica invalid junior synonym Felis catus invalid junior synonym [2] Cats Portal Princeton University From Wikipedia, the free encyclopedia (Redirected from Princeton university) Princeton University is a private coeducational research university located in Princeton, New Jersey. It is one of eight universities that belong to the Ivy League. Originally founded at Elizabeth, New Jersey, in 746 as the College of New Jersey, it relocated to Princeton in 756 and was renamed Princeton University in 896. [3] Princeton was the fourth institution of higher education in the U.S. to conduct classes.[4][5] Princeton has never had any official religious affiliation, rare among American universities of its age. At one time, it had close ties to the Presbyterian Church, but today it is nonsectarian and makes no religious demands on its students.[6][7] The university has ties with the Institute for Advanced Study, Princeton Theological Seminary and the Westminster Choir College of Rider University.[8] Princeton has traditionally focused on undergraduate education and academic research, though in recent decades it has increased its focus on graduate education and offers a large number of professional master's degrees and doctoral programs in a range of subjects. The Princeton University Library holds over six million books. Among many others, areas of research include anthropology, geophysics, entomology, and robotics, while the Forrestal Campus has special facilities for the study of plasma physics and meteorology. History Sculpture by J. Massey Rhind (892), Alexander Hall, Princeton University resources. Motto: Princeton University Dei sub numine viget (Latin for "Under God's power she flourishes") Established 746 Type: Private Endowment: US$5.8 billion[] President: Coordinates: 4.34873, -74.6593 Staff:,3 Undergraduates: 4,923 [2] Postgraduates:,975 Shirley M. Tilghman Location Borough of Princeton, Princeton Township, and West Windsor Township, New Jersey, USA Campus: Suburban, 6 acres (2.4 km!) (Princeton Township) Borough and Athletics: 38 sports teams Colors: Orange and Black Mascot: Tigers Website: www.princeton.edu (http://www.princeton.edu) The history of Princeton goes back to its establishment by "New Light" Presbyterians, Princeton was originally intended to train Presbyterian ministers. It opened at Elizabeth, New Jersey, under the presidency of Jonathan Dickinson as the College of New Jersey. Its second president was Aaron Burr, Sr.; the third was Jonathan Edwards. In 756, the college moved to Princeton, New Jersey. Between the time of the move to Princeton in 756 and the construction of Stanhope Hall in 83, the college's sole building was Nassau Hall, named for William III of England of the House of Orange-Nassau. (A proposal was made to name it for the colonial Governor, Jonathan Belcher, but he declined.) The college also got one of its colors, orange, from William III. During the American Revolution, Princeton was occupied by both sides, and the college's buildings were heavily damaged. The Battle of Princeton, fought in a nearby field in January of 777, proved to be a decisive victory for General George Washington and his troops. Two of Princeton's leading citizens signed the United States Declaration of Independence, and during the summer of 783, the Continental Congress met in Nassau Hall, making Princeton the country's capital for four months. The much-abused landmark survived bombardment with cannonballs in the Revolutionary War when General Washington struggled to wrest the building from British control, as well as later fires that left only its walls standing in 82 and 855. Rebuilt by Joseph Henry Latrobe, John Notman, and John Witherspoon, the modern Nassau Hall has been much revised and expanded from the original designed by Robert Smith. Over the centuries, its role shifted from an all-purpose building, comprising office, dormitory, library, and classroom space, to classrooms only, to its present role as the administrative center of the university. Originally, the sculptures in front of the building were lions, as a gift in 879. These were later replaced with tigers in 9.[9] The Princeton Theological Seminary broke off from the college in 82, since the Presbyterians wanted their ministers to have more theological training, while the faculty and students would have been content with less. This reduced the student body and the external support for Princeton for some time. The two institutions currently enjoy a close relationship based on common history and shared Dog - Wikipedia, the free encyclopedia Dog From Wikipedia, the free encyclopedia (Redirected from Dogs) The dog (Canis lupus familiaris) is a domesticated subspecies of the wolf, a mammal of the Canidae family of the order Carnivora. The term encompasses both feral and pet varieties and is also sometimes used to describe wild canids of other subspecies or species. The domestic dog has been (and continues to be) one of the most widely-kept working and companion animals in human history, as well as being a food source in some cultures. There are estimated to be 4,, dogs in the world. [] The dog has developed into hundreds of varied breeds. Height measured to the withers ranges from a few inches in the Chihuahua to a few feet in the Irish Wolfhound; color varies from white through grays (usually called blue) to black, and browns from light (tan) to dark ("red" or "chocolate") in a wide variation of patterns; and, coats can be very short to many centimeters long, from coarse hair to something akin to wool, straight or curly, or smooth. http://en.wikipedia.org/wiki/dogs Domestic dog Fossil range: Late Pleistocene - Recent Conservation status Domesticated Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Mammalia Order: Carnivora Family: Canidae Genus: Canis Species: C. lupus Subspecies: C. l. familiaris Trinomial name Canis lupus familiaris (Linnaeus, 758) Dogs Portal of 6 2//8 2:53 PM Akaike information criterion From Wikipedia, the free encyclopedia Akaike's information criterion, developed by Hirotsugu Akaike under the name of "an information criterion" (AIC) in 97 and proposed in Akaike (974), is a measure of the goodness of fit of an estimated statistical model. It is grounded in the concept of entropy. The AIC is an operational way of trading off the complexity of an estimated model against how well the model fits the data. Contents Definition 2 AICc and AICu 3 QAIC 4 References 5 See also 6 External links Definition In the general case, the AIC is where k is the number of parameters in the statistical model, and L is the likelihood function. Over the remainder of this entry, it will be assumed that the model errors are normally and independently distributed. Let n be the number of observations and RSS be the residual sum of squares. Then AIC becomes Increasing the number of free parameters to be estimated improves the goodness of fit, regardless of the number of free parameters in the data generating process. Hence AIC not only rewards goodness of fit, bu also includes a penalty that is an increasing function of the number of estimated parameters. This penalty discourages overfitting. The preferred model is the one with the lowest AIC value. The AIC methodology attempts to find the model that best explains the data with a minimum of free parameters. By contrast, more traditional approaches to modeling start from a null hypothesis. The AIC penalizes free parameters less Features Understanding assumptions in features Contents Physiology. Size.2 Skeleton.3 Mouth.4 Ears.5 Legs.6 Skin.7 Senses.8 Metabolism.9 Genetics. Feeding and diet.. Toxic sensitivity 2 Behavior 2. Sociability 2.2 Cohabitation 2.3 Fighting 2.4 Play 2.5 Hunting 2.6 Reproduction 2.7 Hygiene 2.8 Scratching 2.9 Fondness for heights 3 Ecology 3. Habitat 3.2 Impact of hunting 4 House cats 4. Domestication 4.2 Interaction with humans 4.2. Allergens 4.2.2 Trainability 4.3 Indoor scratching 4.3. Declawing 4.4 Waste 4.5 Domesticated varieties 4.5. Coat patterns 4.5.2 Body types 5 Feral cats 5. Environmental effects 5.2 Ethical and humane concerns over feral cats 6 Etymology and taxonomic history 6. Scientific classification 6.2 Nomenclature 6.3 Etymology 7 History and mythology 7. Nine Lives 8 See also 9 References External links. Anatomy.2 Articles.3 Veterinary related Contents History 2 Campus 2. Cannon Green 2.2 Buildings 2.2. McCarter Theater 2.2.2 Art Museum 2.2.3 University Chapel 3 Organization 4 Academics 4. Rankings 5 Student life and culture 6 Athletics 7 Old Nassau 8 Notable alumni and faculty 9 In fiction See also References 2 External links Contents Etymology and taxonomy 2 Origin and evolution 2. Origins 2.2 Ancestry and history of domestication 2.3 Development of dog breeds 2.3. Breed popularity 3 Physical characteristics 3. Differences from other canids 3.2 Sight 3.3 Hearing 3.4 Smell 3.5 Coat color 3.6 Sprint metabolism 4 Behavior and Intelligence 4. Differences from other canids 4.2 Intelligence 4.2. Evaluation of a dog's intelligence 4.3 Human relationships 4.4 Dog communication 4.5 Laughter in dogs 5 Reproduction 5. Differences from other canids 5.2 Life cycle 5.3 Spaying and neutering 5.4 Overpopulation 5.4. United States 6 Working, utility and assistance dogs 7 Show and sport (competition) dogs 8 Dog health 8. Morbidity (Illness) 8.. Diseases 8..2 Parasites 8..3 Common physical disorders The methods we ll study make assumptions about the data on which they are applied. E.g., Documents can be analyzed as a sequence of words; or, as a bag of words. Independent of each other; or, as connected to each other What are the assumptions behind the methods? When/why are they appropriate? Much of this is an art, and it is inherently dynamic Machine Learning: Chenhao Tan Boulder 3 of 24

Outline Memorization vs. Generalization Features Decision tree Machine Learning: Chenhao Tan Boulder 4 of 24

Features Data derived from https://archive.ics.uci.edu/ml/datasets/auto+mpg mpg; cylinders; displacement; horsepower; weight; acceleration; year; origin Goal: predict whether mpg is < 23 ( bad = ) or above ( good = ) given other attributes (other columns). 2 good and 97 bad ; guessing the most frequent class (good) will get 5.5% accuracy. Machine Learning: Chenhao Tan Boulder 5 of 24

Contingency Table values of y values of feature φ v v 2 v K Machine Learning: Chenhao Tan Boulder 6 of 24

Decision Stump Example y maker america europe asia 74 4 9 75 56 7 Machine Learning: Chenhao Tan Boulder 7 of 24

Decision Stump Example root 97:2 maker? y maker america europe asia 74 4 9 75 56 7 america 74:75 europe 4:56 asia 9:7 Machine Learning: Chenhao Tan Boulder 7 of 24

Decision Stump Example root 97:2 maker? y maker america europe asia 74 4 9 75 56 7 america 74:75 europe 4:56 asia 9:7 Errors: 75 + 4 + 9 = 98 (about 25%) Machine Learning: Chenhao Tan Boulder 7 of 24

Decision Stump Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 Machine Learning: Chenhao Tan Boulder 8 of 24

Decision Stump Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 Errors: + 2 + + + 3 = 36 (about 9%) Machine Learning: Chenhao Tan Boulder 8 of 24

Key Idea: Recursion A single feature partitions the data. For each partition, we could choose another feature and partition further. Applying this recursively, we can construct a decision tree. Machine Learning: Chenhao Tan Boulder 9 of 24

Decision Tree Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 maker? america 7:65 europe :53 asia 3:66 Error reduction compared to the cylinders stump? Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 maker? america 67:7 europe 3: asia 3:3 Error reduction compared to the cylinders stump? Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 ϕ? 73: : Error reduction compared to the cylinders stump? Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree Example root 97:2 cylinders? 3 3: 4 2:84 5 :2 6 73: 8 :3 ϕ? ϕ? 2:69 8:5 73: : Error reduction compared to the cylinders stump? Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree: Making a Prediction root n:p ϕ? n :p n :p ϕ 2? n :p n :p ϕ 3? ϕ 4? n :p n :p n :p n :p Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree: Making a Prediction n :p n :p ϕ 3? root n:p ϕ? n :p n :p ϕ 2? n :p n :p n :p ϕ 4? n :p Algorithm: DTREETEST Data: decision tree t, input example x Result: predicted class if t has the form LEAF(y) then return y; else # t.φ is the feature associated with t; # t.child(v) is the subtree for value v; return DTREETEST(t.child(t.φ(x)), x)); end Machine Learning: Chenhao Tan Boulder 2 of 24

Decision Tree: Making a Prediction root n:p ϕ? Equivalent boolean formulas: n :p ϕ 3? n :p ϕ 2? n :p n :p ϕ 4? (φ = ) n < p (φ = ) (φ 2 = ) (φ 3 = ) n < p (φ = ) (φ 2 = ) (φ 3 = ) n < p (φ = ) (φ 2 = ) (φ 4 = ) n < p (φ = ) (φ 2 = ) (φ 4 = ) n < p n :p n :p n :p n :p Machine Learning: Chenhao Tan Boulder 2 of 24

Tangent: How Many Formulas? Assume we have D binary features. Machine Learning: Chenhao Tan Boulder 22 of 24

Tangent: How Many Formulas? Assume we have D binary features. Each feature could be set to, or set to, or excluded (wildcard/don t care). Machine Learning: Chenhao Tan Boulder 22 of 24

Tangent: How Many Formulas? Assume we have D binary features. Each feature could be set to, or set to, or excluded (wildcard/don t care). 3 D formulas. Machine Learning: Chenhao Tan Boulder 22 of 24

Growing a Decision Tree root n:p Machine Learning: Chenhao Tan Boulder 23 of 24

Growing a Decision Tree root n:p ϕ? n :p n :p We chose feature φ. Note that n = n + n and p = p + p. Machine Learning: Chenhao Tan Boulder 23 of 24

Growing a Decision Tree root n:p ϕ? n :p n :p We chose not to split the left partition. Why not? Machine Learning: Chenhao Tan Boulder 23 of 24

Growing a Decision Tree root n:p ϕ? n :p n :p ϕ 2? n :p n :p Machine Learning: Chenhao Tan Boulder 23 of 24

Growing a Decision Tree root n:p ϕ? n :p n :p ϕ 2? n :p n :p ϕ 3? n :p n :p Machine Learning: Chenhao Tan Boulder 23 of 24

Growing a Decision Tree root n:p ϕ? n :p n :p ϕ 2? n :p n :p ϕ 3? ϕ 4? n :p n :p n :p n :p Machine Learning: Chenhao Tan Boulder 23 of 24

Greedily Building a Decision Tree (Binary Features) Algorithm: DTREETRAIN Data: data D, feature set Φ Result: decision tree if all examples in D have the same label y, or Φ is empty and y is the best guess then return LEAF(y); else for each feature φ in Φ do partition D into D and D based on φ-values; let mistakes(φ) = (non-majority answers in D ) + (non-majority answers in D ); end let φ be the feature with the smallest number of mistakes; return NODE(φ, { DTREETRAIN(D, Φ \ {φ }), DTREETRAIN(D, Φ \ {φ })}); end Machine Learning: Chenhao Tan Boulder 24 of 24

Greedily Building a Decision Tree (Binary Features) Algorithm: DTREETRAIN Data: data D, feature set Φ Result: decision tree if all examples in D have the same label y, or Φ is empty and y is the best guess then return LEAF(y); else for each feature φ in Φ do partition D into D and D based on φ-values; let mistakes(φ) = (non-majority answers in D ) + (non-majority answers in D ); end let φ be the feature with the smallest number of mistakes; return NODE(φ, { DTREETRAIN(D, Φ \ {φ }), DTREETRAIN(D, Φ \ {φ })}); end Does this algorithm always terminate? Why? Machine Learning: Chenhao Tan Boulder 24 of 24