Genera&on of Image Descrip&ons. Tambet Ma&isen

Similar documents
IMAGE CAPTIONING USING PHRASE-BASED HIERARCHICAL LSTM MODEL

Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking. Yuanwei Wu

Subdomain Entry Vocabulary Modules Evaluation

The Kaggle Competitions: An Introduction to CAMCOS Fall 2015

Egg laying vs. Live Birth

4--Why are Community Documents So Difficult to Read and Revise?

Reasoning with Neural Networks

Moving towards formalisation COMP62342

Sentences and pictures: not just more words and pictures

Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Activity 9. Teachers notes. Learning objectives. Cross-curricular links. Resources. Extension. Activity

[Boston March for Science 2017 photo Hendrik Strobelt]

The City School. Learn Create Program

American Stories To Build a Fire by Jack London. Lesson Plan by Jill Robbins, Ph.D.

LEARNING OBJECTIVES. Watch and understand a video about a wildlife organization. Watch and listen

Moving toward formalisation COMP62342

LABRADOR RETRIEVER: LABRADOR RETRIEVER TRAINING - COMPLETE LABRADOR PUPPY TRAINING GUIDE, OBEDIENCE, POTTY TRAINING, AND CARE TIPS (RETRIEV

Multiclass and Multi-label Classification

A SPATIAL ANALYSIS OF SEA TURTLE AND HUMAN INTERACTION IN KAHALU U BAY, HI. By Nathan D. Stewart

Effective Vaccine Management Initiative

Hunting Zika Virus using Machine Learning

Predic'ng propaga'on of dengue with human mobility:

Free Supplemental Lesson Pack 3

EVOLUTION IN ACTION: GRAPHING AND STATISTICS

A Comparison of Machine Translation Paradigms for Use in Black-Box Fuzzy-Match Repair

Lesson Objectives. Core Content Objectives. Language Arts Objectives

Identity Management with Petname Systems. Md. Sadek Ferdous 28th May, 2009

Second Interna,onal Workshop on Parts and A5ributes ECCV 2012, Firenze, Italy October, 2012 Discovering a Lexicon of Parts and Attributes

Content Delivery Network Interconnection (CDNI) Request Routing: CDNI Footprint and Capabilities Advertisement using ALTO

Creating Strategic Capital for EVM. EVA th June 2012 Andrew Hill PROJECT CONTROLS CONSULTING

CS6501: Deep Learning for Visual Recognition. CNN Architectures

BANCO DE QUESTÕES - INGLÊS 5 ANO - ENSINO FUNDAMENTAL

Teach Your Dog To Read By Bonnie Bergin Ed.D., Sharon Hogan

HOUGHTON MIFFLIN HARCOURT

Heuris'c search, A* CS171, Fall 2016 Introduc'on to Ar'ficial Intelligence Prof. Alexander Ihler. Reading: R&N

Visual Reward/Correction. Verbal Reward/Correction. Physical Reward/Correction

Beyond Basics: How to Work with Aggressive and Reac8ve Dog Behaviors

STRATEGIES ACHIEVE READING SUCCESS

The ALife Zoo: cross-browser, platform-agnostic hosting of Artificial Life simulations

Campaign Communication Materials 18 November 2008

UNIT 7: Dogs at a glance

Monohybrid Cross Video Review

Homework #3 Answers. You re reinforcing his crying behavior by feeding him when he cries (2pts).

NBN 3MIN GAME CHANGERS

Rethinking RTOs: Identifying and Removing Barriers to Owner Reclaim, Part One

Animals. Helping People. by Carol Ann Greenhalgh HOUGHTON MIFFLIN

Remember! Life skills for puppies

Fractal. Fractals. L- Systems 1/17/12

Workshop Topic: Be a Word Detective (Use Context Clues)

Let s Play Poker: Effort and Software Security Risk Estimation in Software. Picture from

UK biddable media AWARDS 2019

Body Parts and Products (Sessions I and II) BROWARD COUNTY ELEMENTARY SCIENCE BENCHMARK PLAN

A Creature Went Walking A Lesson for Gr. 4-6

Chicken Run Movie Sentences #1

Professional Ultrasonic Dog Whistle Guide

range of RA s accessible during the night.

range of RA s accessible during the night.

Writing Simple Procedures Drawing a Pentagon Copying a Procedure Commanding PenUp and PenDown Drawing a Broken Line...

Training Cats and Dogs to Love Being Petted or Groomed*

Intervention Plan. By: Olivia Bergstrom, Lia Donato, Ashley Hasler, Steve McCollom, and Ashley Staley

A good job for dogs by Gary Boyle

News English.com Ready-to-use ESL / EFL Lessons

Caring and. sharing. We love Hong Kong. 2 Small houses News report. 3 Food in a basin Fun and games Description. 4 Computer Jobs Biography

Dasher Web Service USER/DEVELOPER DOCUMENTATION June 2010 Version 1.1

Loose Leash Walking. Core Rules Applied:

Hatching Science. Lesson Plan. Elementary: Life Cycles and Inherited Traits

Building our reputation by constantly working to improve the equipment, materials and techniques being used in the aquaculture industries.

Teacher Edition. AlphaWorld. Amazing Sea Lizards. Written by Marilyn Woolley

CENTERITY SERVICE PACK FOR CLOUDERA Maximize the Performance and Availability for your C loudera management platform

Teacher Edition. Lizard s Tail. alphakids. Written by Mark Gagiero Illustrated by Kelvin Hucker

Naughty But Nice. minute. 3gamechangers

Connecting Literature and Math - Component of STEM Curriculum

Part4. Saint Fatima Language School Form 3 Second Term 2018 / The Vision of the School : Distinct Environment for Refined Education

4-H Dog Showmanship. Class: Junior Intermediate Senior. 4-Her s Name Dog s Name Breed Show Location Date Judge. Smiling Friendly Confident.

START: Read 1 Guide for Repeated Interactive Read-Alouds

Dynamic Programming for Linear Time Incremental Parsing

Desensitization and Counter Conditioning

IPO HANDLING TIPS 2015 YOUTH SEMINAR

Transition to Cold Blinds

Dog Tracking. Why do dog tracking? Training & Courses. Dog Tracking

Full Edition The Ultimate Dog Breeding Software free software downloading websites ]

Dog Behavior Problems Aggression - Sibling Rivalry Treatment

The Development of Behavior

Community Pet Days A GUIDE FOR REACHING PIT BULL DOG OWNERS IN YOUR COMMUNITY. ANIMAL FARM FOUNDATION

Using Physics for Motion Retargeting

Environmental vs Genetic Factors Argumentation (CER) Prompts

The Distorting Mirror

The online processing of semantic and pragmatic content

Antimicrobial Stewardship Strategy:

Characteristics of the Text Genre Realistic fi ction Text Structure

Strange Dog Breeds. 3. Answer the questions. 1. Why does the writer talk about Hungarian Pulis, Xoloitzcuintlis and Bedlington Terriers?

News English.com Ready-to-use ESL / EFL Lessons

TEAMWORKS AGILITY TRAINING JOURNAL

Author s Purpose. Author s Purpose

Page # Events Page # Previous Event/Explanation 4 Kitten tried to lick the moon and she got a bug on her tongue. milk.

LABORATORY EXERCISE 7: CLADISTICS I

Separation Anxiety. Techniques to calm your dog

Punnett Squares. and Pedigrees. How are patterns of inheritance studied? Lesson ESSENTIAL QUESTION. J S7L3.b Reproduction and genetic variation

Transcription:

Genera&on of Image Descrip&ons Tambet Ma&isen 14.10.2015

Agenda Datasets Convolu&onal neural networks Neural language models Neural machine transla&on Genera&on of image descrip&ons AFen&on Metrics

A year ago Baidu/UCLA: Explain Images with Mul&modal Recurrent Neural Networks Toronto: Unifying Visual- Seman&c Embeddings with Mul&modal Neural Language Models Berkeley: Long- term Recurrent Convolu&onal Networks for Visual Recogni&on and Descrip&on Google: Show and Tell: A Neural Image Cap&on Generator Stanford: Deep Visual- Seman&c Alignments for Genera&ng Image Descrip&on UML/UT: Transla&ng Videos to Natural Language Using Deep Recurrent Neural Networks Microso=/CMU: Learning a Recurrent Visual Representa&on for Image Cap&on Genera&on Microso=: From Cap&ons to Visual Concepts and Back

123287 images 5 descrip&ons for each 1. A woman with a bike walks by a blue bus with the bbc logo in the front of it. 2. A woman with a bike in front of a bus. 3. A girl walks her bicycle in front of a bus on a busy city street. 4. Young girl with bicycle in front of a public transporta&on bus and large group of people. 5. Woman with a bicycle wearing a helmet crossing the street in front of a blue bus.

Descrip&on vs Cap&on SBU 1M Cap&ons BBC News A woman with a bike walks by a blue bus with the bbc logo in the front of it. VS Me and Lisa had a blast in London last weekend.

Convolu&onal neural networks Learn layers of hierarchical features. Transfer learning: discard last classifica&on layer and use fixed network as a feature extractor. Image: NVidia

Transfer learning

Language Model Predict next word using previous words. THE CAT SAT ON A MAT??? Classical N- gram model Feed- forward neural network Recurrent neural network Long Short- Term Memory (LSTM)

Tri- gram Model ) ( ) ( ), ( 1 2 1 2 2 1 = t t t t t t t t w w count w w w count w w w P MAT THE CAT ON A SAT Simple to implement. Huge memory needs in case of bigger vocabulary and bigger N.

Feed- forward Neural Network MAT H THE CAT SAT ON THE Straighaoward extension of N- gram, more powerful model. S&ll only fixed context is considered.

Neural Language Model Socmax output layer (probabili&es for word t) V nodes HxV weights Hidden layer H nodes DxH weights DxH weights D nodes Learned distributed representa&on of word t- 2 Learned distributed representa&on of word t- 1 D nodes VxD weights (shared) V nodes 1- of- V representa&on of word t- 2 1- of- V representa&on of word t- 1 V nodes Bengio et al. A Neural Probabilis&c Language Model (2003)

Recurrent Neural Network THE CAT SAT ON THE MAT H 1 H 2 H 3 H 4 H 5 H 6 <BOS> THE CAT SAT ON THE Retains theore&cally context of any length.

Long Short Term Memory Able to retain context longer than vanilla RNN. Image: Wikipedia

Neural Machine Transla&on KASS ISTUS MATIL <EOS> H 1 H 2 H 3 H 4 H 5 H 6 H 7 H 8 H 9 H 10 THE CAT SAT ON THE MAT <EOS> KASS ISTUS MATIL

Genera&ng Descrip&ons for Images THE CAT SAT ON THE MAT <EOS> H 1 H 2 H 3 H 4 H 5 H 6 H 7 THE CAT SAT ON THE MAT

Vinyals et al. Show and Tell: A Neural Image Cap&on Generator (2014)

Demos hfp://nic.droppages.com/ Results for 1000 images from each dataset. In addi&on, one ground truth sentence is shown. hfp://cs.stanford.edu/people/karpathy/ deepimagesent/rankingdemo/ For every test set sentence below we retrieve the top images (from set of 1000). hfp://deeplearning.cs.toronto.edu/i2t Internal Server Error hfps://www.youtube.com/watch? v=w2iv8gt5cd4&feature=youtu.be

AFen&on The concept of afen&on is the most interes&ng recent architectural innova&on in neural networks. Andrej Karpathy Two kinds of afen&on: Soc afen&on Hard afen&on

Soc AFen&on Probability distribu&on is laid over the image. This distribu&on depends on higher level features and is learned using backpropaga&on. Xu et al. Show, AFend and Tell: Neural Image Cap&on Genera&on with Visual AFen&on (2015)

Correct Examples Xu et al. Show, AFend and Tell: Neural Image Cap&on Genera&on with Visual AFen&on (2015)

Incorrect examples Xu et al. Show, AFend and Tell: Neural Image Cap&on Genera&on with Visual AFen&on (2015)

Hard AFen&on At each &mestemp network focuses only on part of the image. Implemented using reinforcement learning. Mnih et al. Recurrent Models of Visual AFen&on (2014)

Hard AFen&on Implementa&on In case of classifica&on, the ac&on is the class. Class of last glimpse is the output of the network. Mnih et al. Recurrent Models of Visual AFen&on (2014)

Soc vs Hard AFen&on Soc afen&on Simpler to implement. Doesn t scale to big images. Hard afen&on More complicated implementa&on. Scales to big images and beats convolu&onal networks. Xu et al. Show, AFend and Tell: Neural Image Cap&on Genera&on with Visual AFen&on (2015)

Metrics BLEU (Bilingual Evalua&on Understudy) METEOR (Metric for Evalua&on of Transla&on with Explicit ORdering) CIDEr (Consensus- based Image Descrip&on Evalua&on) ROUGE (Recall- Oriented Understudy for Gis&ng Evalua&on) TER (Transla&on Error Rate)

BLEU The closer a machine transla&on is to a professional human transla&on, the befer it is. N- gram overlap between machine transla&on output and reference transla&on. Compute precision for n- grams of size 1 to 4. Add brevity penalty (for too short transla&ons). BLEU 4 output length = min 1, reference length i= 1 precision i 1 4

Correla&ons with Human Judgement Spearman's rho CIDEr 0.581 Meteor 0.560 BLEU4 0.459 ROUGE- SU4 0.440 TER - 0.290 Desmon Elliot, hfps://github.com/elliofd/compareimagedescrip&onmeasures

Ideal World Desmon Elliot, hfps://github.com/elliofd/compareimagedescrip&onmeasures

BLEU4 Desmon Elliot, hfps://github.com/elliofd/compareimagedescrip&onmeasures

METEOR Desmon Elliot, hfps://github.com/elliofd/compareimagedescrip&onmeasures

Thank you! Tambet Ma&isen tambet@ut.ee