Multiclass and Multi-label Classification

Similar documents
Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Nathan A. Thompson, Ph.D. Adjunct Faculty, University of Cincinnati Vice President, Assessment Systems Corporation

Section: 101 (2pm-3pm) 102 (3pm-4pm)

The Kaggle Competitions: An Introduction to CAMCOS Fall 2015

[Boston March for Science 2017 photo Hendrik Strobelt]

STAT170 Exam Preparation Workshop Semester

How to use Mating Module Pedigree Master

Measure time using nonstandard units. (QT M 584)

Colour and Pattern Guide for Domestic Cats

Chapter 18: Categorical data

Building Concepts: Mean as Fair Share

The Lost Treasures of Giza

INFO 1103 Homework Project 2

Handling missing data in matched case-control studies using multiple imputation

Microchipping Works: Best Practices

LONG RANGE PERFORMANCE REPORT. Study Objectives: 1. To determine annually an index of statewide turkey populations and production success in Georgia.

The Force Concept Inventory (FCI) is currently

Trend Analysis

Alberta Conservation Association 2009/10 Project Summary Report

Pierre-Louis Toutain, Ecole Nationale Vétérinaire National veterinary School of Toulouse, France Wuhan 12/10/2015

No tail (Manx) is a dominant trait and its allele is represented by M The presence of a tail is recessive and its allele is represented by m

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Rule-Based. Introduction to Data Mining, 2 nd Edition

Sketch Out the Design

Photo courtesy of PetSmart Charities, Inc., and Sherrie Buzby Photography. Community Cat Programs Handbook. CCP Operations: Post-surgery Recovery

FreeBonus: Teach your Cavalier King Charles Spaniel 13 Amazing Tricks!

Visual Reward/Correction. Verbal Reward/Correction. Physical Reward/Correction

CS6501: Deep Learning for Visual Recognition. CNN Architectures

Hayden Island free-roaming cat survey protocol

Dogs and cats are enormously popular as companion

MAIL ORDER HATCHERIES: OPERATIONAL AND DISTRIBUTION LOGISTICS, SALMONELLA INTERVENTION ACTIVITIES AIMED AT PREVENTION OF HUMAN SALMONELLOSIS

CE West June 1-3, 2018 Wine Country Inn, Palisade, CO

Psy Advanced Laboratory in Operant Behavior Dognition Laboratory One. I. Let s see how well your dog can observe pointing signals.

Biological Invasions and Herpetology. 4/18/13 Chris Thawley

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking. Yuanwei Wu

2013 Holiday Lectures on Science Medicine in the Genomic Era

DICOM Correction Proposal

Dunbia 2017 Dunbia 2017

Biology 164 Laboratory

Free Bonus: Teach your Miniature Schnauzer 13 Amazing Tricks!

The Genetics of Color In Labradors

Hunting Zika Virus using Machine Learning

Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

Scratch Lesson Plan. Part One: Structure. Part Two: Movement

Controllability of Complex Networks. Yang-Yu Liu, Jean-Jacques Slotine, Albert-Laszlo Barbasi Presented By Arindam Bhattacharya

Professional Ultrasonic Dog Whistle Guide

Rear Crosses with Drive and Confidence

Subdomain Entry Vocabulary Modules Evaluation

Mastitis Reports in Dairy Comp 305

Optimizing use of quality antimicrobial medicines in humans

Crate Training. The great question of dog training is: To Crate or Not To Crate.

Thank you all for doing such a good job implementing all of the September 1 Regulation and Guidelines changes! We appreciate all of your hard work.

Introduction to the Cheetah

A Guide to Physical Characteristics of Cats

Five ways to drive your vet crazy

Judging. The Judge s Seat. The 4-H Dairy Project. Resource Guide - Judging

Test. Assessment. Putting. to the. Inside Features. Features

Re: Sample ID: Letzty [ ref:_00di0ijjl._500i06g6gf:ref ] 1 message

ANIMAL RABIES IN NEPAL AND RACCOON RABIES IN ALBANY COUNTY, NEW YORK

Evaluating the quality of evidence from a network meta-analysis

Training Cats and Dogs to Love Being Petted or Groomed*

Understanding Heredity one example

Relative effectiveness of Irish factories in the surveillance of slaughtered cattle for visible lesions of tuberculosis,

Dynamic Programming for Linear Time Incremental Parsing

UNICORN GENETICS Understanding Inheritance

Twenty years of GuSG conservation efforts on Piñon Mesa: 1995 to Daniel J. Neubaum Wildlife Conservation Biologist Colorado Parks and Wildlife

Answers to Questions about Smarter Balanced 2017 Test Results. March 27, 2018

Gross Pathology. Johne s disease. Johne s Disease: The ostrich approach just isn t working! The result: Damaged intestine

Heuristic search, A* CS171, Winter 2018 Introduction to Artificial Intelligence Prof. Richard Lathrop. Reading: R&N

Population Dynamics: Predator/Prey Teacher Version

The Increase and Spread of Mosquito Borne Diseases. Deidre Evans

SHEEP SIRE REFERENCING SCHEMES - NEW OPPORTUNITIES FOR PEDIGREE BREEDERS AND LAMB PRODUCERS a. G. Simm and N.R. Wray

!"#$%&'()*&+,)-,)."#/')!,)0#/') 1/2)3&'45)."#+"/5%&6)7/,-,$,8)9::;:<;<=)>6+#-"?!

Story Points: Estimating Magnitude

Claw lesions as a predictor of lameness in breeding sows Deen, J., Anil, S.S. and Anil, L. University of Minnesota USA

Colorado Animal Shelter Data Trends Discussion Group April 13, 2015

Genetic and Genomic Evaluation of Claw Health Traits in Spanish Dairy Cattle N. Charfeddine 1, I. Yánez 2 & M. A. Pérez-Cabal 2

AR-DRG Version 8.0 Definitions Manual. Errata 3, November 2017

Soap Opera Genetics Genetics to Resolve Family Arguments 1

People food for pets was generally considered unhealthy, with 65% of pet owners and 67% of veterinary professionals agreeing.

EVOLUTION IN ACTION: GRAPHING AND STATISTICS

David Who?? More Theories. Premack examples. Library Article

Study Skills-Paragraph & Essay Structure

Understanding Heredity one example

Puppy Culture Essentials Playlist for Puppy Owners

Do the traits of organisms provide evidence for evolution?

Identity Management with Petname Systems. Md. Sadek Ferdous 28th May, 2009

DIFFERENT BREEDS DEMAND DIFFERENT INCUBATION MEASURES

Comparison of different methods to validate a dataset with producer-recorded health events

SOLO Taxonomy and Assessing Student Thinking

PREDICATE QUESTIONS FOR K9 OFFICERS FOR CERTIFICATION AS AN EXPERT WITNESS

What is taxonomy? Taxonomy is the grouping and naming of organisms. Biologists who study this are called taxonomists

ECONOMIC studies have shown definite

GENETIC DRIFT Carol Beuchat PhD ( 2013)

Section A. Answer all questions. Answer each question in the space provided for that question. Use 90 and Over on page 2 of the Data Sheet.

Understanding EBV Accuracy

Thank you for purchasing House Train Any Dog! This guide will show you exactly how to housetrain any dog or puppy successfully.

OBEDIENCE JUDGES ASSOCIATION SAMPLE MULTI-CHOICE QUESTIONS ANSWERS

An Estimate of the Number of Dogs in US Shelters. Kimberly A. Woodruff, DVM, MS, DACVPM David R. Smith, DVM, PhD, DACVPM (Epi)

Sample Seminar Topics

Quality Milk on Pasture Based Dairy Farms. Scott E. Poock, DVM University of Missouri Clinical Assistant Professor DABVP Beef and Dairy Cattle

Transcription:

Multiclass and Multi-label Classification INFO-4604, Applied Machine Learning University of Colorado Boulder September 21, 2017 Prof. Michael Paul

Today Beyond binary classification All classifiers we ve looked at so far have predicted one of two classes We ll learn two main ways of predicting one of many classes: Repurposing binary classifiers Extending logistic regression Outputting multiple labels Sometimes straightforward, but sometimes not Tricks for better results

Multiclass Classification What color is the cat in this photo? Calico Orange Tabby Tuxedo

Multiclass Classification Multiclass classification refers to the setting when there are > 2 possible class labels. x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby It s possible to create multiclass classifiers out of binary classifiers.

One versus Rest One vs rest (or one vs all) classification involves training a binary classifier for each class Each classifier predicts whether the instance belongs to the target class or not

One versus Rest One vs rest (or one vs all) classification involves training a binary classifier for each class Each classifier predicts whether the instance belongs to the target class or not x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby

One versus Rest One vs rest (or one vs all) classification involves training a binary classifier for each class Each classifier predicts whether the instance belongs to the target class or not Calico classifier x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Yes 2.50 1.00 4.87 5.95 No - 2.34-1.24-0.88-1.31 No 0.55 0.59-3.08 1.27 No 2.08-3.46 4.62-1.13 No

One versus Rest One-vs-rest (or one-vs-all) classification involves training a binary classifier for each class Each classifier predicts whether the instance belongs to the target class or not Orange Tabby classifier x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 No 2.50 1.00 4.87 5.95 Yes - 2.34-1.24-0.88-1.31 No 0.55 0.59-3.08 1.27 Yes 2.08-3.46 4.62-1.13 No

One versus Rest What color is the cat in this photo? Classifier Calico Orange Tabby Tuxedo Gray Tabby Prediction No Yes No No

One versus Rest What color is the cat in this photo? Classifier Calico Orange Tabby Tuxedo Gray Tabby Prediction No Yes No No We ll go with Orange Tabby as the best prediction.

One versus Rest What color is the cat in this photo? Classifier Calico Orange Tabby Tuxedo Gray Tabby Prediction No Yes No Yes What if multiple classifiers said yes?

One versus Rest What color is the cat in this photo? Classifier Calico Orange Tabby Tuxedo Gray Tabby Prediction No No No No What if none of the classifiers said yes?

One versus Rest Instead of only using the final binary prediction of each classifier, consider the score associated with the prediction. Recall: We defined a classification score for the linear classifiers we ve seen as the dot product w T x i Other kinds of classifiers usually have some sort of score, but it might look different Go with whichever one-vs-rest classifier has the highest score (highest confidence in prediction)

One versus Rest What color is the cat in this photo? Classifier Score Calico - 4.59 Orange Tabby 2.18 Tuxedo - 1.80 Gray Tabby 0.73

One versus Rest What color is the cat in this photo? Classifier Calico Score - 4.59 Orange Tabby 2.18 Tuxedo - 1.80 Gray Tabby 0.73 We ll go with Orange Tabby as the best prediction.

All Pairs The all pairs approach to multiclass classification trains a binary classifier for every pair of classes Whichever class wins more pairwise classifications will be the final prediction

All Pairs The all pairs approach to multiclass classification trains a binary classifier for every pair of classes Whichever class wins more pairwise classifications will be the final prediction x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby

All Pairs The all pairs approach to multiclass classification trains a binary classifier for every pair of classes Whichever class wins more pairwise classifications will be the final prediction Calico vs Tuxedo classifier x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby

All Pairs The all pairs approach to multiclass classification trains a binary classifier for every pair of classes Whichever class wins more pairwise classifications will be the final prediction Calico vs Orange Tabby classifier x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby

All Pairs The all pairs approach to multiclass classification trains a binary classifier for every pair of classes Whichever class wins more pairwise classifications will be the final prediction Tuxedo vs Orange Tabby classifier x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby

All Pairs What color is the cat in this photo? Classifier Calico vs Orange Calico vs Tuxedo Calico vs Gray Orange vs Tuxedo Orange vs Gray Prediction Orange Tuxedo Gray Orange Orange

All Pairs What color is the cat in this photo? Classifier Calico vs Orange Calico vs Tuxedo Calico vs Gray Orange vs Tuxedo Orange vs Gray Prediction Orange Tuxedo Gray Orange Orange We ll go with Orange Tabby as the best prediction.

Multiclass Classification These approaches can work reasonably well All pairs is faster to train; one-vs-rest is faster at making predictions sklearn implements one-vs-rest by default when you give more than two classes to a binary classifier Next we ll see how logistic regression can handle multiple classes without having to combine different binary classifiers

Logistic Regression Before: Binary logistic regression used the logistic function to give the probability that an instance belonged to the positive class. P(y i = 1 x i ) = 1 1 + exp(-w T x i )

Logistic Regression Multinomial (or multivariate) logistic regression uses a similar but more general function (the softmax function) for the probability of K classes: P(y i = k x i ) = exp(w kt x i ) K k =1 exp(w k T x i )

Logistic Regression Binary Multinomial One weight vector w K weight vectors, w k Score plugged into Vector of K scores logistic function to get plugged into softmax value between [0, 1] function get to vector of K values, each between [0,1] and all values sum to 1 Probability of negative class is just 1 minus probability of positive class Each class probability depends on its own score from its own weight vector

Logistic Regression What color is the cat in this photo? Class Probability Calico 0.03 Orange Tabby 0.62 Tuxedo 0.04 Gray Tabby 0.11

Logistic Regression What color is the cat in this photo? Class Probability Calico 0.03 Orange Tabby 0.62 Tuxedo 0.04 Gray Tabby 0.11 Orange Tabby has the highest probability.

Logistic Regression The weights can be learned with gradient descent, just like in the binary version. The loss function is the negative log-likelihood of the training data, as before. Won t go into the details in this class, but updates look similar to what you ve seen.

Logistic Regression Other names for multinomial logistic regression that you might encounter: Multiclass logistic regression Maximum entropy (MaxEnt) classifier Softmax regression

Multi-label Classification What color and sex is the cat in this photo? Calico Female Orange Tabby Male Tuxedo Male

Multi-label Classification Multi-label classification refers to the setting when there > 1 label you want to predict. x 1 x 2 x 3 x 4 y 1 y 2 1.01-4.26 7.99-0.03 Calico Female 2.50 1.00 4.87 5.95 Orange Tabby Male - 2.34-1.24-0.88-1.31 Tuxedo Male 0.55 0.59-3.08 1.27 Orange Tabby Male 2.08-3.46 4.62-1.13 Gray Tabby Female

Multi-label Classification Starting point: train two separate classifiers One predicts sex One predicts color This might work fine, but there are some things to think about when doing this.

Multi-label Classification Two independent classifiers might output combinations of labels that don t make sense Calico cats are almost always female If your classifiers predict male and calico, this is probably wrong There might be correlations between the classes that you could help classification if you had a way to combine the two classifiers Orange cats are more often male (~80% of the time) If your classifier(s) believed the cat was orange, this would increase the belief that it is male (or vice versa)

Multi-label Classification One idea: train one classifier first, use its output as a feature in the other.

Example: First train a classifier to predict color: x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico 2.50 1.00 4.87 5.95 Orange Tabby - 2.34-1.24-0.88-1.31 Tuxedo 0.55 0.59-3.08 1.27 Orange Tabby 2.08-3.46 4.62-1.13 Gray Tabby Then train a classifier to predict sex, using the predicted color as an additional feature. x 1 x 2 x 3 x 4 x 5 y 1.01-4.26 7.99-0.03 Calico? Female 2.50 1.00 4.87 5.95 Orange Tabby? Male - 2.34-1.24-0.88-1.31 Tuxedo? Male 0.55 0.59-3.08 1.27 Orange Tabby? Male 2.08-3.46 4.62-1.13 Gray Tabby? Female

Multi-label Classification One idea: train one classifier first, use its output as a feature in the other. Limitations: If the first classifier is wrong, you ll have an incorrect feature value. This is a pipeline approach where one classifier informs the other, rather than both informing each other simultaneously

Multi-label Classification Another idea: treat combinations of classes as their own classes, then do single-label classification x 1 x 2 x 3 x 4 y 1.01-4.26 7.99-0.03 Calico + Female 2.50 1.00 4.87 5.95 Orange Tabby + Male - 2.34-1.24-0.88-1.31 Tuxedo + Male 0.55 0.59-3.08 1.27 Orange Tabby + Male 2.08-3.46 4.62-1.13 Gray Tabby + Female

Multi-label Classification Another idea: treat combinations of classes as their own classes, then do single-label classification This way you can learn that Calico + Male is very unlikely, etc. Limitations: All classes are learned independently: the classifier has no idea that Tuxedo+Male and Tuxedo+Female are both the same color and therefore probably have similar feature weights

Summary Multiclass and multi-label situations arise often. Some simple solutions exist that are often effective. More sophisticated solutions exist; some we will see later in the semester. Don t confuse multiclass and multi-label! They are independent concepts. Something can be multiclass but not multi-label, or vice versa, or both, or neither.