Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks

Similar documents
CS6501: Deep Learning for Visual Recognition. CNN Architectures

[Boston March for Science 2017 photo Hendrik Strobelt]

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking. Yuanwei Wu

The Kaggle Competitions: An Introduction to CAMCOS Fall 2015

Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

Genera&on of Image Descrip&ons. Tambet Ma&isen

Where Is My Puppy? Retrieving Lost Dogs by Facial Features

Multiclass and Multi-label Classification

Available online at ScienceDirect. Procedia Computer Science 102 (2016 )

Nathan A. Thompson, Ph.D. Adjunct Faculty, University of Cincinnati Vice President, Assessment Systems Corporation

IMAGE CAPTIONING USING PHRASE-BASED HIERARCHICAL LSTM MODEL

STUDY BEHAVIOR OF CERTAIN PARAMETERS AFFECTING ASSESSMENT OF THE QUALITY OF QUAIL EGGS BY COMPUTER VISION SYSTEM

Effective Vaccine Management Initiative

Dynamic Programming for Linear Time Incremental Parsing

Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Design of High Speed Vedic Multiplier Using Carry Select Adder with Brent Kung Adder

Pre-natal construction of neural circuits (the highways are genetically specified):

Comparison of Parallel Prefix Adders Performance in an FPGA

Subdomain Entry Vocabulary Modules Evaluation

Writing Simple Procedures Drawing a Pentagon Copying a Procedure Commanding PenUp and PenDown Drawing a Broken Line...

Design of 32 bit Parallel Prefix Adders

Building Concepts: Mean as Fair Share

FPGA Implementation of Efficient 16-Bit Parallel Prefix Kogge Stone Architecture for Convolution Applications Geetha.B 1 Ramachandra.A.

6. 1 Leaping Lizards!

Comparative Evaluation of Online and Paper & Pencil Forms for the Iowa Assessments ITP Research Series

16-BIT CARRY SELECT ADDER. Anushree Garg B.Tech Scholar, JVW, University, Rajasthan, India

The Impact of Gigabit LTE Technologies on the User Experience

Answers to Questions about Smarter Balanced 2017 Test Results. March 27, 2018

Design of 16-Bit Adder Structures - Performance Comparison

Biol 160: Lab 7. Modeling Evolution

Rules of Connectivity between Geniculate Cells and Simple Cells in Cat Primary Visual Cortex

Reversing Category Exclusivities in Infant Perceptual Categorization: Simulations and Data

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

Pierre-Louis Toutain, Ecole Nationale Vétérinaire National veterinary School of Toulouse, France Wuhan 12/10/2015

10015NAT Graduate Diploma Veterinary Acupuncture

Trends and challenges in Engineering geodesy

Effects of Cage Stocking Density on Feeding Behaviors of Group-Housed Laying Hens

Application of Fuzzy Logic in Automated Cow Status Monitoring

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET)

Australian Journal of Basic and Applied Sciences. Performance Analysis of Different Types of Adder Using 3-Transistor XOR Gate

Implementation and Estimation of Delay, Power and Area for Parallel Prefix Adders

Design and Estimation of delay, power and area for Parallel prefix adders

This article is downloaded from.

Lab 7. Evolution Lab. Name: General Introduction:

Reasoning with Neural Networks

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Rule-Based. Introduction to Data Mining, 2 nd Edition

Multi-Frequency Study of the B3 VLA Sample. I GHz Data

Hunting Zika Virus using Machine Learning

A Column Generation Algorithm to Solve a Synchronized Log-Truck Scheduling Problem

Management traits. Teagasc, Moorepark, Ireland 2 ICBF

Design of Carry Select Adder with Binary Excess Converter and Brent Kung Adder Using Verilog HDL

FPGA-based Emotional Behavior Design for Pet Robot

University of Pennsylvania. From Perception and Reasoning to Grasping

Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption

A Novel Approach For Error Detection And Correction Using Prefix-Adders

TREAT Steward. Antimicrobial Stewardship software with personalized decision support

REVIEW OF CARRY SELECT ADDER BY USING BRENT KUNG ADDER

Cat Swarm Optimization

Design of a High Speed Adder

Comparison of different methods to validate a dataset with producer-recorded health events

Today s Agenda. Why does this matter? A Dangerous Mind. Data Collection. Data Analysis. Data Interpretation. Case Studies

Component Specification NFQ Level 5. Sheep Husbandry 5N Component Details. Sheep Husbandry. Level 5. Credit Value 10

[EMC Publishing Note: In this document: CAT 1 stands for the C est à toi! Level One Second Edition Teacher s Annotated Edition of the Textbook.

Microchipping Works: Best Practices

Design of Modified Low Power and High Speed Carry Select Adder Using Brent Kung Adder

Supplementary Fig. 1: Comparison of chase parameters for focal pack (a-f, n=1119) and for 4 dogs from 3 other packs (g-m, n=107).

PROTOCOL FOR EVALUATION OF AGILITY COURSE ACCORDING TO DIFFICULTY FOUND

Building Rapid Interventions to reduce antimicrobial resistance and overprescribing of antibiotics (BRIT)

GSA Helper Procedures

Semantically-driven Automatic Creation of Training Sets for Object Recognition

Track & Search Dog Information for Judges

Sheepdog: Alternative software-defined storage on your OpenStack cloud

Econometric Analysis Dr. Sobel

Supporting Online Material for

Package TurtleGraphics

Chapter VII Non-linear SSI analysis of Structure-Isolated footings -soil system

Your web browser (Safari 7) is out of date. For more security, comfort and the best experience on this site: Update your browser Ignore

LN #13 (1 Hr) Decomposition, Pattern Recognition & Abstraction CTPS Department of CSE

Active sensing. Ehud Ahissar

Lecture 4: Controllability and observability

The City School. Learn Create Program

CS108L Computer Science for All Module 7: Algorithms

9: Coffee Break. 10:00-11: Spatial Risk Mapping (Thomas Van Boekel) 11:00-12: Dynamic Bayesian Network (Yrjo Grohn)

Lab 6: Energizer Turtles

Required and Recommended Supporting Information for IUCN Red List Assessments

Evaluation of infestation level of cattle by the tick Rhipicephalus microplus in New-Caledonia : Test of a new assessment grid

Date of Change. Nature of Change

Prevention Concepts & Solutions Inc.

The IUCN Red List of Threatened Species

EVM analysis of an Interference Limited SIMO-SC System With Independent and Correlated Channels

Bayesian Analysis of Population Mixture and Admixture

Introduction to Python Dictionaries

An Esterel Virtual Machine (EVM) Aruchunan Vaseekaran

ENTRY CLERK MANUAL FOR THE ENTRYCLERK.CFA.ORG WEB APPLICATION. Page 1

Course: Canine Massage and Bodywork Certification Course Part A Cranial Trunk and Thoracic Appendicular System. Movers of the Forelimb, Neck, and Head

Lecture 1: Turtle Graphics. the turtle and the crane and the swallow observe the time of their coming; Jeremiah 8:7

Super Retriever Series Rules and Regulations

Evaluating the quality of evidence from a network meta-analysis

SCIENTIFIC REPORT. Analysis of the baseline survey on the prevalence of Salmonella in turkey flocks, in the EU,

Heuristic search, A* CS171, Winter 2018 Introduction to Artificial Intelligence Prof. Richard Lathrop. Reading: R&N

Transcription:

Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks Danijel Skočaj University of Ljubljana Faculty of Computer and Information Science Academic year: 2017/18

Media hype Develpment of inteligent systems, Object recognition with CNNs 2

Superior performance ILSVRC results Deep learning era Develpment of inteligent systems, Object recognition with CNNs 3

History - Perceptron The Mark I Perceptron machine was the first implementation of the perceptron algorithm. The machine was connected to a camera that used 20 20 cadmium sulfide photocells to produce a 400- pixel image. recognized letters of the alphabet update rule: Frank Rosenblatt, ~1957: Perceptron Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 4

History Biological evidence Hubel & Wiesel, 1959 Receptive fields of single neurons in the cat s striate cortex 1962 Receptive fields, binocular interaction and functional architecture in the cat s visual cortex 1968... Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 5

History LeNet-5 Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-5 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 6

History AlexNet first strong results Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition George Dahl, Dong Yu, Li Deng, Alex Acero, 2010 Imagenet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 7

Beginning of the Deep learning era More data! More computational power! Improved learning details! Develpment of inteligent systems, Object recognition with CNNs 8

The main concept Zelier and Fergus, 2014 Develpment of inteligent systems, Object recognition with CNNs 9

End to end learning Representations as well as classifier are being learned Develpment of inteligent systems, Object recognition with CNNs 10

Perceptron Rosenblatt, 1957 Binary inputs and output Weights Threshold Bias Very simple! Develpment of inteligent systems, Object recognition with CNNs 11

Sigmoid neurons Real inputs and outputs from interval [0,1] Activation function: sgimoid function output = Develpment of inteligent systems, Object recognition with CNNs 12

Sigmoid neurons Small changes in weights and biases causes small change in output Enables learning! Develpment of inteligent systems, Object recognition with CNNs 13

Feedfoward neural networks Network architecture: Develpment of inteligent systems, Object recognition with CNNs 14

Example: recognizing digits MNIST database of handwritten digits 28x28 pixes (=784 input neurons) 10 digits 50.000 training images 10.000 validation images 10.000 test images Develpment of inteligent systems, Object recognition with CNNs 15

Example code: Feedforward Code from https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip or https://github.com/mnielsen/neural-networks-and-deep-learning git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git Develpment of inteligent systems, Object recognition with CNNs 16

Loss function Given: for all training images Loss function: (mean sqare error quadratic loss function) Find weigths w and biases b that for given input x produce output a that minimizes Loss function C Develpment of inteligent systems, Object recognition with CNNs 17

Gradient descend Find minimum of Change of C: Gradient of C: Change v in the opposite direction of the gradient: Algorithm: Initialize v Until stopping criterium riched Apply udate rule Learning rate Develpment of inteligent systems, Object recognition with CNNs 18

Gradient descend in neural networks Loss function Update rules: Consider all training samples Very many parameters => computationaly very expensive Use Stochastic gradient descend instead Develpment of inteligent systems, Object recognition with CNNs 19

Stochastic gradient descend Compute gradient only for a subset of m training samples: Mini-batch: Approximate gradient: Update rules: Training: 1. Initialize w and b 2. In one epoch of training keep randomly selecting one mini-batch of m samples at a time (and train) until all training images are used 3. Repeat for several epochs Develpment of inteligent systems, Object recognition with CNNs 20

Example code: SGD Develpment of inteligent systems, Object recognition with CNNs 21

Backpropagation All we need is gradient of loss function Rate of change of C wrt. to change in any weigt Rate of change of C wrt. to change in any biase How to compute gradient? Numericaly Simple, approximate, extremely slow Analyticaly for entire C Fast, exact, nontractable Chain individual parts of netwok Fast, exact, doable Backpropagation! Develpment of inteligent systems, Object recognition with CNNs 22

Main principle We need the gradient of the Loss function Two phases: Forward pass; propagation: the input sample is propagated through the network and the error at the final layer is obtained Backward pass; weight update: the error is backpropagated to the individual levels, the contribution of the individual neuron to the error is calculated and the weights are updated accordingly Develpment of inteligent systems, Object recognition with CNNs 23

Learning strategy To obtain the gradient of the Loss function : For every neuron in the network calculate error of this neuron This error propagates through the netwok causing the final error Backpropagate the final error to get all Obtain all and from Develpment of inteligent systems, Object recognition with CNNs 24

Equations of backpropagation BP1: Error in the output layer: BP2: Error in terms of the error in the next layer: BP3: Rate of change of the cost wrt. to any bias: BP4: Rate of change of the cost wrt. to any weight: Develpment of inteligent systems, Object recognition with CNNs 25

Backpropagation algorithm Input x: Set the corresponding activation for the input layer Feedforward: For each compute Output error : Compute the output error Backpropagate the error: For each compute Output the gradient: Develpment of inteligent systems, Object recognition with CNNs 26

Backpropagation and SGD For a number of epochs Until all training images are used Select a mini-batch of training samples For each training sample in the mini-batch Input: set the corresponding activation Feedforward: for each compute and Output error: compute Backpropagation: for each compute Gradient descend: for each and update: Develpment of inteligent systems, Object recognition with CNNs 27

Example code: Backpropagation Develpment of inteligent systems, Object recognition with CNNs 28

Locality of computation activations local gradient f gradients Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 29

Activation and loss functions Activation function Linear Loss function Quadratic Sigmoid Cross-entropy Softmax Log-likelihood Develpment of inteligent systems, Object recognition with CNNs 30

Activation functions Sigmoid Leaky ReLU max(0.1x, x) tanh tanh(x) ELU ReLU max(0,x) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 31

Overfitting Huge number of parameters -> danger of overfitting Use validation set to determine overfitting and early stopping Hold out method overfitting overfitting early stopping 1,000 MNIST training images 50,000 MNIST training images Develpment of inteligent systems, Object recognition with CNNs 32

Regularization How to avoid overfitting: Increase the number of training images Decrease the number of parameters Regularization Regularization: L2 regularization L1 regularization Dropout Data augmentation Develpment of inteligent systems, Object recognition with CNNs 33

L2 regularisation Add the regularisation term in the loss function L 2 norm Regularisation parameter Regularisation term Develpment of inteligent systems, Object recognition with CNNs 34

Weight decay Loss function: Partial derivatives: Update rules: Weight decay Develpment of inteligent systems, Object recognition with CNNs 35

Dropout Randomly (and temporarily) delete half (or p) hidden neurons in the network Then restore the neurons and repeat the process Halve the weights when running the full network in test time Or double the weights during learning Ensemble learning: training multiple networks and averaging the results Reduces complex co-adaptations of neurons Smaller models harder to overfit Usually significantly improves the results Develpment of inteligent systems, Object recognition with CNNs 36

Data augmentation Use more data! Synthetically generate new data Apply different kinds of transformations: translations, rotations, elastic distortions, appearance modifications (intensity, blur) Operations should reflect real-world variation Develpment of inteligent systems, Object recognition with CNNs 37

Weight initialization Ad-hoc normalization Initialize weights with N(0,1) Variance is growing with n in Many large z => many saturated neurons Slow learning Better initialization Normalize variance with Initialize weights with N(0,1/n in ) Total variance is limited Faster learning! In case of ReLU: Init with N( 0, 1/(n in /2) ) Event better: Batch normalization Develpment of inteligent systems, Object recognition with CNNs 38

Parameter updates Different schemes for updating gradient Gradient descend Momentum update Nesterov momentum AdaGrad update RMSProp update Adam update Learning rate decay Image credits: Alec Radford Develpment of inteligent systems, Object recognition with CNNs 39

Example code Better weight initialization Cross-entropy cost Regularization Develpment of inteligent systems, Object recognition with CNNs 40

Setting up the network Set up the network Coarse-fine cross-validation in stages Only a few epochs to get a rough idea Even on a smaller problem to speed up the process Longer running time, finer search, Cross-validation strategy Check various parameter settings Always sample parameters Check the results, adjust the range Hyperparameters to play with: network architecture learning rate, its decay schedule, update type regularization (L2/Dropout strength) Run multiple validations simultaneously Actively observe the learning progress Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 41

Convolutional neural networks From feedforward fully-connected neural networks To convolutional neural networks Develpment of inteligent systems, Object recognition with CNNs 42

Convolution example Convolution operation: Discrete convolution: Two-dimensional convolution: Convolution is commutative: Cross-correlation: flipped kernel Develpment of inteligent systems, Object recognition with CNNs 43

Convolution layer 32x32x3 image Filters always extend the full depth of the input volume 5x5x3 filter 32 height 3 32 depth width Convolve the filter with the image i.e. slide over the image spatially, computing dot products Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 44

Convolution layer 32 32x32x3 image 5x5x3 filter Several filters! convolve (slide) over all spatial locations activation map 28 3 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) 1 28 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 45

Convolution layer 6 filters -> 6 activation maps activation maps 32 28 Convolution Layer 32 3 We stack these up to get a new image of size 28x28x6! Develpment of inteligent systems, Object recognition with CNNs 46 6 28 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson

Convolutional neural network ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 24 3 32 CONV, ReLU e.g. 6 5x5x3 filters 28 6 CONV, ReLU e.g. 10 5x5x6 filters 10 24 CONV, ReLU. Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 47

Sparse connectivity Local connectivity neurons are only locally connected (receptive field) Reduces memory requirements Improves statistical efficiency Requires fewer operations from below from above The receptive field of the units in the deeper layers is large => Indirect connections! Develpment of inteligent systems, Object recognition with CNNs 48

Parameter sharing Neurons share weights! Tied weights Every element of the kernel is used at every position of the input All the neurons at the same level detect the same feature (everywhere in the input) Greatly reduces the number of parameters! Equivariance to translation Shift, convolution = convolution, shift Object moves => representation moves Fully connected network with an infinitively strong prior over its weights Tied weights Weights are zero outside the kernel region => learns only local interactions and is equivariant to translations Develpment of inteligent systems, Object recognition with CNNs 49

Convolutional neural network [From recent Yann LeCun slides] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 50

Convolutional neural network one filter => one activation map example 5x5 filters (32 total) input image: Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 51

Stride N7 F F 7x7 input (N=7) 3x3 filter (F=3) Stride 1 => 5x5 output 7N Stride 2 => 3x3 output Output size: (N - F) / stride + 1 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 52

Zero padding Extend the image to allow processing of neighboring pixels e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border 7x7 output! To preserve size: stride 1 filter size FxF zero-padding with (F-1)/2. e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 53

Conv layer parameters Common settings: - K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P =? (whatever fits) - F = 1, S = 1, P = 0 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 54

Pooling layer makes the representations smaller and more manageable operates over each activation map independently downsampling Example: Max pooling Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 55

Pooling Max pooling introduces translation invariance Pooling with downsampling Reduces the representation size Reduces computational cost Increases statistical efficiency Develpment of inteligent systems, Object recognition with CNNs 56

Pooling layer parameters Common settings: two F = 2, S = 2 F = 3, S = 2 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 57

CNN layers Layers used to build ConvNets: INPUT: raw pixel values CONV: convolutional layer ReLU: introducing nonlinearity POOL: downsampling FC: for computing class scores Develpment of inteligent systems, Object recognition with CNNs 58

CNN architecture Stack the layers in an appropriate order Babenko et. al. Hu et. al. Develpment of inteligent systems, Object recognition with CNNs 59

CNN architecture Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 60

Case study LeNet-5 [LeCun et al., 1998] Conv filters were 5x5, applied at stride 1 Subsampling (Pooling) layers were 2x2 applied at stride 2 i.e. architecture is [CONV-POOL-CONV-POOL-CONV-FC] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 61

Case study - AlexNet [Krizhevsky et al. 2012] http://fromdata.org/2015/10/01/imagenet-cnn-architecture-image/ INPUT CONV1 CONV2 CONV3 CONV4 CONV5 FC6 FC7 FC8 POOL1 POOL2 POOL3 NORM1 NORM2 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 62

Case studay - VGGNet [Simonyan and Zisserman, 2014] best model Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride 2 11.2% top 5 error in ILSVRC 2013 -> 7.3% top 5 error Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 63

Case study - GoogLeNet [Szegedy et al., 2014] Inception module ILSVRC 2014 winner (6.7% top 5 error) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 64

Case study - ResNet spatial dim. only 56x56! - Batch Normalization after every CONV layer - Xavier/2 initialization from He et al. - SGD + Momentum (0.9) - Learning rate: 0.1, divided by 10 when validation error plateaus - Mini-batch size 256 - Weight decay of 1e-5 - No dropout used ILSVRC 2015 winner (3.6% top 5 error) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 65

Case study Inception-v4 x3 3 x 4 4 x7 5 x 7 3 4 x 3 x4 9 3.6% top 5 error 75 layers [Szegedy et al., 2016] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 66

Analysis of DNN models [Canziani et al., 2017] Develpment of inteligent systems, Object recognition with CNNs 67

Transfer learning If you don t have enough data use pretrained models! 1. Train on Imagenet 2. Small dataset: feature extractor Freeze these Train this 3. Medium dataset: finetuning more data = retrain more of the network (or all of it) Freeze these tip: use only ~1/10th of the original learning rate in finetuning top layer, and ~1/100th on intermediate layers Train this Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 68

Wide usabilty of ConvNets Classification Retrieval [Krizhevsky 2012] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 69

Wide usabilty of ConvNets Detection Segmentation [Faster R-CNN: Ren, He, Girshick, Sun 2015] [Farabet et al., 2012] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 70

Wide usabilty of ConvNets NVIDIA Tegra X1 self-driving cars Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 71

Wide usabilty of ConvNets [Taigman et al. 2014] [Simonyan et al. 2014] [Goodfellow 2014] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 72

Wide usabilty of ConvNets [Toshev, Szegedy 2014] [Mnih 2013] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 73

Wide usabilty of ConvNets [Ciresan et al. 2013] [Sermanet et al. 2011] [Ciresan et al.] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 74

Wide usabilty of ConvNets [Denil et al. 2014] [Turaga et al., 2010] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 75

Wide usabilty of ConvNets Whale recognition, Kaggle Challenge Mnih and Hinton, 2010 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 76

Wide usabilty of ConvNets Image Captioning [Vinyals et al., 2015] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 77

Wide usabilty of ConvNets reddit.com/r/deepdream Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 78

Literature Michael A. Nielsen, Neural Networks and Deep learning, Determination Press, 2015 http://neuralnetworksanddeeplearning.com/index.html Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016 http://www.deeplearningbook.org/ Fei-Fei Li, Andrej Karpathy, Justin Johnson, CS231n: Convolutional Neural Networks for Visual Recognition, Stanford University, 2016 http://cs231n.stanford.edu/ Papers Develpment of inteligent systems, Object recognition with CNNs 79

Software Neural networks in Python Convolutional neural networks using Theano or TensorFlow or other deep learning frameworks Develpment of inteligent systems, Object recognition with CNNs 80