Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks

Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks Danijel Skočaj University of Ljubljana Faculty of Computer and Information Science Academic year: 2017/18

Media hype Develpment of inteligent systems, Object recognition with CNNs 2

Superior performance ILSVRC results Deep learning era Develpment of inteligent systems, Object recognition with CNNs 3

History - Perceptron The Mark I Perceptron machine was the first implementation of the perceptron algorithm. The machine was connected to a camera that used 20 20 cadmium sulfide photocells to produce a 400- pixel image. recognized letters of the alphabet update rule: Frank Rosenblatt, ~1957: Perceptron Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 4

History Biological evidence Hubel & Wiesel, 1959 Receptive fields of single neurons in the cat s striate cortex 1962 Receptive fields, binocular interaction and functional architecture in the cat s visual cortex 1968... Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 5

History LeNet-5 Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-5 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 6

History AlexNet first strong results Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition George Dahl, Dong Yu, Li Deng, Alex Acero, 2010 Imagenet classification with deep convolutional neural networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 7

Beginning of the Deep learning era More data! More computational power! Improved learning details! Develpment of inteligent systems, Object recognition with CNNs 8

The main concept Zelier and Fergus, 2014 Develpment of inteligent systems, Object recognition with CNNs 9

End to end learning Representations as well as classifier are being learned Develpment of inteligent systems, Object recognition with CNNs 10

Perceptron Rosenblatt, 1957 Binary inputs and output Weights Threshold Bias Very simple! Develpment of inteligent systems, Object recognition with CNNs 11

Sigmoid neurons Real inputs and outputs from interval [0,1] Activation function: sgimoid function output = Develpment of inteligent systems, Object recognition with CNNs 12

Sigmoid neurons Small changes in weights and biases causes small change in output Enables learning! Develpment of inteligent systems, Object recognition with CNNs 13

Feedfoward neural networks Network architecture: Develpment of inteligent systems, Object recognition with CNNs 14

Example: recognizing digits MNIST database of handwritten digits 28x28 pixes (=784 input neurons) 10 digits 50.000 training images 10.000 validation images 10.000 test images Develpment of inteligent systems, Object recognition with CNNs 15

Example code: Feedforward Code from https://github.com/mnielsen/neural-networks-and-deep-learning/archive/master.zip or https://github.com/mnielsen/neural-networks-and-deep-learning git clone https://github.com/mnielsen/neural-networks-and-deep-learning.git Develpment of inteligent systems, Object recognition with CNNs 16

Loss function Given: for all training images Loss function: (mean sqare error quadratic loss function) Find weigths w and biases b that for given input x produce output a that minimizes Loss function C Develpment of inteligent systems, Object recognition with CNNs 17

Gradient descend Find minimum of Change of C: Gradient of C: Change v in the opposite direction of the gradient: Algorithm: Initialize v Until stopping criterium riched Apply udate rule Learning rate Develpment of inteligent systems, Object recognition with CNNs 18

Gradient descend in neural networks Loss function Update rules: Consider all training samples Very many parameters => computationaly very expensive Use Stochastic gradient descend instead Develpment of inteligent systems, Object recognition with CNNs 19

Stochastic gradient descend Compute gradient only for a subset of m training samples: Mini-batch: Approximate gradient: Update rules: Training: 1. Initialize w and b 2. In one epoch of training keep randomly selecting one mini-batch of m samples at a time (and train) until all training images are used 3. Repeat for several epochs Develpment of inteligent systems, Object recognition with CNNs 20

Example code: SGD Develpment of inteligent systems, Object recognition with CNNs 21

Backpropagation All we need is gradient of loss function Rate of change of C wrt. to change in any weigt Rate of change of C wrt. to change in any biase How to compute gradient? Numericaly Simple, approximate, extremely slow Analyticaly for entire C Fast, exact, nontractable Chain individual parts of netwok Fast, exact, doable Backpropagation! Develpment of inteligent systems, Object recognition with CNNs 22

Main principle We need the gradient of the Loss function Two phases: Forward pass; propagation: the input sample is propagated through the network and the error at the final layer is obtained Backward pass; weight update: the error is backpropagated to the individual levels, the contribution of the individual neuron to the error is calculated and the weights are updated accordingly Develpment of inteligent systems, Object recognition with CNNs 23

Learning strategy To obtain the gradient of the Loss function : For every neuron in the network calculate error of this neuron This error propagates through the netwok causing the final error Backpropagate the final error to get all Obtain all and from Develpment of inteligent systems, Object recognition with CNNs 24

Equations of backpropagation BP1: Error in the output layer: BP2: Error in terms of the error in the next layer: BP3: Rate of change of the cost wrt. to any bias: BP4: Rate of change of the cost wrt. to any weight: Develpment of inteligent systems, Object recognition with CNNs 25

Backpropagation algorithm Input x: Set the corresponding activation for the input layer Feedforward: For each compute Output error : Compute the output error Backpropagate the error: For each compute Output the gradient: Develpment of inteligent systems, Object recognition with CNNs 26

Backpropagation and SGD For a number of epochs Until all training images are used Select a mini-batch of training samples For each training sample in the mini-batch Input: set the corresponding activation Feedforward: for each compute and Output error: compute Backpropagation: for each compute Gradient descend: for each and update: Develpment of inteligent systems, Object recognition with CNNs 27

Example code: Backpropagation Develpment of inteligent systems, Object recognition with CNNs 28

Locality of computation activations local gradient f gradients Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 29

Activation and loss functions Activation function Linear Loss function Quadratic Sigmoid Cross-entropy Softmax Log-likelihood Develpment of inteligent systems, Object recognition with CNNs 30

Activation functions Sigmoid Leaky ReLU max(0.1x, x) tanh tanh(x) ELU ReLU max(0,x) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 31

Overfitting Huge number of parameters -> danger of overfitting Use validation set to determine overfitting and early stopping Hold out method overfitting overfitting early stopping 1,000 MNIST training images 50,000 MNIST training images Develpment of inteligent systems, Object recognition with CNNs 32

Regularization How to avoid overfitting: Increase the number of training images Decrease the number of parameters Regularization Regularization: L2 regularization L1 regularization Dropout Data augmentation Develpment of inteligent systems, Object recognition with CNNs 33

L2 regularisation Add the regularisation term in the loss function L 2 norm Regularisation parameter Regularisation term Develpment of inteligent systems, Object recognition with CNNs 34

Weight decay Loss function: Partial derivatives: Update rules: Weight decay Develpment of inteligent systems, Object recognition with CNNs 35

Dropout Randomly (and temporarily) delete half (or p) hidden neurons in the network Then restore the neurons and repeat the process Halve the weights when running the full network in test time Or double the weights during learning Ensemble learning: training multiple networks and averaging the results Reduces complex co-adaptations of neurons Smaller models harder to overfit Usually significantly improves the results Develpment of inteligent systems, Object recognition with CNNs 36

Data augmentation Use more data! Synthetically generate new data Apply different kinds of transformations: translations, rotations, elastic distortions, appearance modifications (intensity, blur) Operations should reflect real-world variation Develpment of inteligent systems, Object recognition with CNNs 37

Weight initialization Ad-hoc normalization Initialize weights with N(0,1) Variance is growing with n in Many large z => many saturated neurons Slow learning Better initialization Normalize variance with Initialize weights with N(0,1/n in ) Total variance is limited Faster learning! In case of ReLU: Init with N( 0, 1/(n in /2) ) Event better: Batch normalization Develpment of inteligent systems, Object recognition with CNNs 38

Parameter updates Different schemes for updating gradient Gradient descend Momentum update Nesterov momentum AdaGrad update RMSProp update Adam update Learning rate decay Image credits: Alec Radford Develpment of inteligent systems, Object recognition with CNNs 39

Example code Better weight initialization Cross-entropy cost Regularization Develpment of inteligent systems, Object recognition with CNNs 40

Setting up the network Set up the network Coarse-fine cross-validation in stages Only a few epochs to get a rough idea Even on a smaller problem to speed up the process Longer running time, finer search, Cross-validation strategy Check various parameter settings Always sample parameters Check the results, adjust the range Hyperparameters to play with: network architecture learning rate, its decay schedule, update type regularization (L2/Dropout strength) Run multiple validations simultaneously Actively observe the learning progress Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 41

Convolutional neural networks From feedforward fully-connected neural networks To convolutional neural networks Develpment of inteligent systems, Object recognition with CNNs 42

Convolution example Convolution operation: Discrete convolution: Two-dimensional convolution: Convolution is commutative: Cross-correlation: flipped kernel Develpment of inteligent systems, Object recognition with CNNs 43

Convolution layer 32x32x3 image Filters always extend the full depth of the input volume 5x5x3 filter 32 height 3 32 depth width Convolve the filter with the image i.e. slide over the image spatially, computing dot products Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 44

Convolution layer 32 32x32x3 image 5x5x3 filter Several filters! convolve (slide) over all spatial locations activation map 28 3 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias) 1 28 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 45

Convolution layer 6 filters -> 6 activation maps activation maps 32 28 Convolution Layer 32 3 We stack these up to get a new image of size 28x28x6! Develpment of inteligent systems, Object recognition with CNNs 46 6 28 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson

Convolutional neural network ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 28 24 3 32 CONV, ReLU e.g. 6 5x5x3 filters 28 6 CONV, ReLU e.g. 10 5x5x6 filters 10 24 CONV, ReLU. Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 47

Sparse connectivity Local connectivity neurons are only locally connected (receptive field) Reduces memory requirements Improves statistical efficiency Requires fewer operations from below from above The receptive field of the units in the deeper layers is large => Indirect connections! Develpment of inteligent systems, Object recognition with CNNs 48

Parameter sharing Neurons share weights! Tied weights Every element of the kernel is used at every position of the input All the neurons at the same level detect the same feature (everywhere in the input) Greatly reduces the number of parameters! Equivariance to translation Shift, convolution = convolution, shift Object moves => representation moves Fully connected network with an infinitively strong prior over its weights Tied weights Weights are zero outside the kernel region => learns only local interactions and is equivariant to translations Develpment of inteligent systems, Object recognition with CNNs 49

Convolutional neural network [From recent Yann LeCun slides] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 50

Convolutional neural network one filter => one activation map example 5x5 filters (32 total) input image: Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 51

Stride N7 F F 7x7 input (N=7) 3x3 filter (F=3) Stride 1 => 5x5 output 7N Stride 2 => 3x3 output Output size: (N - F) / stride + 1 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 52

Zero padding Extend the image to allow processing of neighboring pixels e.g. input 7x7 3x3 filter, applied with stride 1 pad with 1 pixel border 7x7 output! To preserve size: stride 1 filter size FxF zero-padding with (F-1)/2. e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 53

Conv layer parameters Common settings: - K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P =? (whatever fits) - F = 1, S = 1, P = 0 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 54

Pooling layer makes the representations smaller and more manageable operates over each activation map independently downsampling Example: Max pooling Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 55

Pooling Max pooling introduces translation invariance Pooling with downsampling Reduces the representation size Reduces computational cost Increases statistical efficiency Develpment of inteligent systems, Object recognition with CNNs 56

Pooling layer parameters Common settings: two F = 2, S = 2 F = 3, S = 2 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 57

CNN layers Layers used to build ConvNets: INPUT: raw pixel values CONV: convolutional layer ReLU: introducing nonlinearity POOL: downsampling FC: for computing class scores Develpment of inteligent systems, Object recognition with CNNs 58

CNN architecture Stack the layers in an appropriate order Babenko et. al. Hu et. al. Develpment of inteligent systems, Object recognition with CNNs 59

CNN architecture Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 60

Case study LeNet-5 [LeCun et al., 1998] Conv filters were 5x5, applied at stride 1 Subsampling (Pooling) layers were 2x2 applied at stride 2 i.e. architecture is [CONV-POOL-CONV-POOL-CONV-FC] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 61

Case study - AlexNet [Krizhevsky et al. 2012] http://fromdata.org/2015/10/01/imagenet-cnn-architecture-image/ INPUT CONV1 CONV2 CONV3 CONV4 CONV5 FC6 FC7 FC8 POOL1 POOL2 POOL3 NORM1 NORM2 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 62

Case studay - VGGNet [Simonyan and Zisserman, 2014] best model Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride 2 11.2% top 5 error in ILSVRC 2013 -> 7.3% top 5 error Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 63

Case study - GoogLeNet [Szegedy et al., 2014] Inception module ILSVRC 2014 winner (6.7% top 5 error) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 64

Case study - ResNet spatial dim. only 56x56! - Batch Normalization after every CONV layer - Xavier/2 initialization from He et al. - SGD + Momentum (0.9) - Learning rate: 0.1, divided by 10 when validation error plateaus - Mini-batch size 256 - Weight decay of 1e-5 - No dropout used ILSVRC 2015 winner (3.6% top 5 error) Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 65

Case study Inception-v4 x3 3 x 4 4 x7 5 x 7 3 4 x 3 x4 9 3.6% top 5 error 75 layers [Szegedy et al., 2016] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 66

Analysis of DNN models [Canziani et al., 2017] Develpment of inteligent systems, Object recognition with CNNs 67

Transfer learning If you don t have enough data use pretrained models! 1. Train on Imagenet 2. Small dataset: feature extractor Freeze these Train this 3. Medium dataset: finetuning more data = retrain more of the network (or all of it) Freeze these tip: use only ~1/10th of the original learning rate in finetuning top layer, and ~1/100th on intermediate layers Train this Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 68

Wide usabilty of ConvNets Classification Retrieval [Krizhevsky 2012] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 69

Wide usabilty of ConvNets Detection Segmentation [Faster R-CNN: Ren, He, Girshick, Sun 2015] [Farabet et al., 2012] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 70

Wide usabilty of ConvNets NVIDIA Tegra X1 self-driving cars Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 71

Wide usabilty of ConvNets [Taigman et al. 2014] [Simonyan et al. 2014] [Goodfellow 2014] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 72

Wide usabilty of ConvNets [Toshev, Szegedy 2014] [Mnih 2013] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 73

Wide usabilty of ConvNets [Ciresan et al. 2013] [Sermanet et al. 2011] [Ciresan et al.] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 74

Wide usabilty of ConvNets [Denil et al. 2014] [Turaga et al., 2010] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 75

Wide usabilty of ConvNets Whale recognition, Kaggle Challenge Mnih and Hinton, 2010 Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 76

Wide usabilty of ConvNets Image Captioning [Vinyals et al., 2015] Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 77

Wide usabilty of ConvNets reddit.com/r/deepdream Slide credit: Fei-Fei Li, Andrej Karpathy, Justin Johnson Develpment of inteligent systems, Object recognition with CNNs 78

Literature Michael A. Nielsen, Neural Networks and Deep learning, Determination Press, 2015 http://neuralnetworksanddeeplearning.com/index.html Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press, 2016 http://www.deeplearningbook.org/ Fei-Fei Li, Andrej Karpathy, Justin Johnson, CS231n: Convolutional Neural Networks for Visual Recognition, Stanford University, 2016 http://cs231n.stanford.edu/ Papers Develpment of inteligent systems, Object recognition with CNNs 79

Software Neural networks in Python Convolutional neural networks using Theano or TensorFlow or other deep learning frameworks Develpment of inteligent systems, Object recognition with CNNs 80