Perplexity of n-gram and dependency language models

Similar documents
Entailment above the word level in distributional semantics

INSPIRE A WRITING REVOLUTION! /

Teacher s Notes. Level 3. Did you know? Pearson English Kids Readers. Teacher s Notes. Summary of the story. Background information

Our class had 2 incubators full of eggs. On day 21, our chicks began to hatch. In incubator #1, 1/3 of the eggs hatched. There were 2 chicks.

A Comparison of Machine Translation Paradigms for Use in Black-Box Fuzzy-Match Repair

Grade 2 English Language Arts

Upgrade your Lessons in a minute!

Dynamic Programming for Linear Time Incremental Parsing

Logical Forms. Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER February 16, 2017

Second Interna,onal Workshop on Parts and A5ributes ECCV 2012, Firenze, Italy October, 2012 Discovering a Lexicon of Parts and Attributes

Egg laying vs. Live Birth

The Kiwi. lesson 1. 2 Unit 1: Animals. Before You Read. Look at the picture. Read the sentences. Check ( ) True, False, or Don t Know.

The integration of dogs into collaborative humanrobot. - An applied ethological approach - PhD Thesis. Linda Gerencsér Supervisor: Ádám Miklósi

Lavin's Radiography For Veterinary Technicians PDF

ANIMAL WORLD. Teacher's Notes

Strange Dog Breeds. 3. Answer the questions. 1. Why does the writer talk about Hungarian Pulis, Xoloitzcuintlis and Bedlington Terriers?

INTRODUCTION & MEASURING ANIMAL BEHAVIOR

Brinton & Fujiki Brigham Young University Social Communication Intervention Script for story book, The Pigeon Wants a Puppy

Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

The Netherlands and Katje the Windmill Cat

Natural Language Processing (NLP)

Student Booklet. Grade 4. Georgia. Narrative Task: Animal Adventure Stories. Copyright 2014 by Write Score LLC

Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate. Amoxicillin trihydrate

Handling missing data in matched case-control studies using multiple imputation

Using social media research methods to identify hidden churches

Differentiated Activities for Teaching Key

Bella. Scholastic Short Reads Sample

The OIE Terrestrial and Aquatic Animal Health Codes

Grade 3, Prompt for Opinion Writing

RULES FOR THE FCI EUROPEAN CUP FOR ENGLISH HUNTING SPANIELS REGULATION A OF THE FCI

Dinosaurs. Lesson 1 Amazing dinosaurs. 1 Talk about it What do you know about dinosaurs?

Pan European maps of Vector Borne diseases

Adoption Application

Lesson Objectives. Core Content Objectives. Language Arts Objectives

English *P48988A0112* E202/01. Pearson Edexcel Functional Skills. P48988A 2015 Pearson Education Ltd. Level 2 Component 2: Reading

LABRADOR RETRIEVER: LABRADOR RETRIEVER TRAINING - COMPLETE LABRADOR PUPPY TRAINING GUIDE, OBEDIENCE, POTTY TRAINING, AND CARE TIPS (RETRIEV

This document is a preview generated by EVS

Roman Vanek PKD Board Chairman

Selectively Breeding Dogs

Representation, Visualization and Querying of Sea Turtle Migrations Using the MLPQ Constraint Database System

Activity 9. Teachers notes. Learning objectives. Cross-curricular links. Resources. Extension. Activity

Text Difficulty Words and Phrases. Low Level Text Inference. Middle Level Text Key Facts (Retrieval)

Let s Talk Turkey Selection Let s Talk Turkey Expository Thinking Guide Color-Coded Expository Thinking Guide and Summary

Tuition fees for academic year 2018/19

E9 Regulations for the European Standard for fancy Pigeons (ESFP) Status Definition of the European Standards for fancy pigeons (ESFP) (SDA

My signature confirms that I will not discuss the content of the test with anyone until the end of the 5 day test window.

Module 2: Artificial Selection

The Fearsome Machine

TITLE: Recognition and Diagnosis of Sepsis in Rural or Remote Areas: A Review of Clinical and Cost-Effectiveness and Guidelines

Semantically-driven Automatic Creation of Training Sets for Object Recognition

GET WRITING! Write your own WW1 newspaper article

FCI-Standard N 352 / / GB. RUSSIAN TOY (Russkiy Toy)

Grade 5, Prompt for Opinion Writing Common Core Standard W.CCR.1

Spot the (wildcat) hybrid not an easy task

Anglia Examination Syndicate (England) Certificate in English for Overseas Candidates

288 Seymour River Place North Vancouver, BC V7H 1W6

8A READ-ALOUD. How Turtle Cracked His Shell. Lesson Objectives. Language Arts Objectives. Core Vocabulary

International Rescue Dog Organisation. Guideline IRO Team Competition

COMMISSION DELEGATED REGULATION (EU) /... of XXX

HYGIENA ALIMENTORUM XXXIX

Teacher Edition. Lizard s Tail. alphakids. Written by Mark Gagiero Illustrated by Kelvin Hucker

Teacher Edition. AlphaWorld. Amazing Sea Lizards. Written by Marilyn Woolley

Name: RJS-FARVIEW BLUEBELLA. Birthdate: OCTOBER 10, Sire: S-S-I Robust Mana 7087-ET. Dam: RJS-FARVIEW BUTTERFLY

WHEN TRAINING JUST THE DOG IS NOT ENOUGH...

Antimicrobial resistance I: Situation and strategies in Europe

Textbook Of Veterinary Diagnostic Radiology Download Free (EPUB, PDF)

Aabo, Søren; Ricci, Antonia; Denis, Martine; Bengtsson, Björn; Dalsgaard, Anders; Rychlik, Ivan; Jensen, Annette Nygaard

VISUALIZING TEXT. Petra Isenberg

FOLD&LEARN. five in a row volume 2 FI AR. Make Way for Ducklings. by Robert McCloskey

Minutes of the section meeting Cavies of 11. May 2018 in Kolding / DK

Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Bacterial contamination of hen s table eggs and its influencing

FIRST TERM READING REVISION PAPER ENGLISH LANGUAGE GRADE 3

Chinese New Year ACTIVITY 1: Animals (all levels) - WORKSHEET 1

ENVIRACOR J-5 aids in the control of clinical signs associated with Escherichia coli (E. coli) mastitis

ALBERTA ANIMAL WELFARE CONFERENCE Dr. Duane Landals

Question Bank. Class 4. Q2: What changes do you see in Tom s personality after his and Edward s lives were exchanged?

Five ways to drive your vet crazy

Name Date. A a rooster B a horse C an elephant. A tired B happy C worried. A busy B surprising C still

Dog Training Workshop

General information about testing puppies for tracking, socialization, and dealing with unwanted behaviors.

Standard operating procedure

Protect your trees in the ground: What s new on the antimicrobial front?

Communicating about AR: It s complicated but not impossible! Mary Beth Wenger Health Communications Specialist New York State Department of Health

Semantics. These slides were produced by Hadas Kotek.

PORCINE ZONA PELLUCIDA IMMUNOCONTRACEPTION OF SOUTHERN WHITE RHINOCEROS IN MANAGED CARE.

Subject: Animal Science Calendar : Timeframe: 1 st 9 Weeks

A few applications of natural language processing

Follow-up meeting 2 GENBAS

TITLE: Antibiotics for the Treatment of Tularemia: Clinical-Effectiveness, Cost- Effectiveness, and Guidelines

Reasoning with Neural Networks

L'Invitation Au Voyage/Invitation To The Voyage: A Poem From The Flowers Of Evil By Charles Baudelaire;Richard Wilbur READ ONLINE

Dr Mária Szabó Science and NewTechnologies Departement OIE AMR Strategy and the Prudent Use of Antimicrobials

Muse Teacher Guide: February 2018

Cat in a Box. Written and illustrated by Jo Williamson. 1 Introducing the book

Middle East Animal Welfare Action Plan 1

Forest Characters T E AC H ER PAG E. Directions: Print out the cards double-sided, so that the picture is on one side and the text on the other.

Weaver Dunes, Minnesota

FCI Guidelines for awarding the CACIT at International Utility, Tracking and Mondioring Tests


Transcription:

Perplexity of n-gram and dependency language models Martin Popel, David Mareček ÚFAL, Charles University in Prague TSD, 13th International Conference on Text, Speech and Dialogue September 8, 2010, Brno

Outline Language Models (LM) basics design decisions Post-ngram LM Dependency LM Evaluation Conclusion & future plans 2

Language Models basics P(s) =? P( The dog barked again ) 3

Language Models basics P(s) =? P( The dog barked again ) > P( The dock barked again ) 4

Language Models basics P(s) = P(w 1, w 2,... w m ) P( The dog barked again ) = P(w 1 = The, w 2 = dog, w 3 = barked, w 4 = again ) 5

Language Models basics P(s) = P(w 1, w 2,... w m ) = P(w 1 ) P(w 2 w 1 ) P(w m w 1,...,w m-1 ) Chain rule P( The dog barked again ) = P(w 1 = The ) P(w 2 = w 1 = The ) dog P(w 3 = barked w 1 = The, w 2 = dog ) P(w 4 = again w 1 = The, w 2 = dog, w 3 = barked ) 6

Language Models basics P(s) = P(w 1, w 2,... w m ) = P(w 1 ) P(w 2 w 1 ) P(w m w 1,...,w m-1 ) Changed notation P( The dog barked again ) = P(w i = The i=1) P(w i = i=2, w i-1 = The ) dog P(w i = barked i=3, w i-2 = The, w i-1 = dog ) P(w i = again i=4, w i-3 = The, w i-2 = dog, w i-1 = barked ) 7

Language Models basics P(s) = P(w 1, w 2,... w m ) = P(w 1 ) P(w 2 w 1 ) P(w m w 1,...,w m-1 ) Artificial start-of-sentence token P( The dog barked again ) = P(w i = The i=1, w i-1 = NONE ) P(w i = i=2, w i-2 = NONE, w i-1 = The ) dog P(w i = barked i=3, w i-3 = NONE, w i-2 = The, w i-1 = dog ) P(w i = again i=4, w i-4 = NONE, w i-3 = The, w i-2 = dog, w i-1 = barked ) 8

Language Models basics P(s) = P(w 1, w 2,... w m ) = P(w 1 ) P(w 2 w 1 ) P(w m w 1,...,w m-1 ) Position backoff P( The dog barked again ) P(w i = The w i-1 = NONE ) P(w i = w i-2 = NONE, w i-1 = The ) dog P(w i = barked w i-3 = NONE, w i-2 = The, w i-1 = dog ) P(w i = again w i-4 = NONE, w i-3 = The, w i-2 = dog, w i-1 = barked ) 9

Language Models basics P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i w i-1 ) History backoff (bigram LM) P( The dog barked again ) P(w i = The w i-1 = NONE ) P(w i = w i-1 = The ) dog P(w i = barked w i-1 = dog ) P(w i = again w i-1 = barked ) 10

Language Models basics P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i w i-2, w i-1 ) History backoff (trigram LM) P( The dog barked again ) P(w i = The w i-2 = NONE, w i-1 = NONE ) P(w i = w i-2 = NONE, w i-1 = The ) dog P(w i = barked w i-2 = The, w i-1 = dog ) P(w i = again w i-2 = dog, w i-1 = barked ) 11

Language Models basics P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i w i-2, w i-1 ) In general: Π i=1..m P(w i h i ) h i context (history) of word w i 12

Language Models design decisions 1) How to factorize P(w 1, w 2,... w m ) into Π i=1..m P(w i h i ), i.e. what word-positions will be used as the context h i? 2) What additional context information will be used (apart from word forms), e.g. stems, lemmata, POS tags, word classes,...? 3) How to estimate P(w i h i ) from the training data? Which smoothing technique will be used? (Good-Turing, Jelinek-Mercer, Katz, Kneser-Ney,...) Generalized Parallel Backoff etc. 13

Language Models design decisions 1) How to factorize P(w 1, w 2,... w m ) into Π i=1..m P(w i h i ), this i.e. what word-positions will be used as the context h i? work 2) What additional context information will be used (apart from word forms), e.g. stems, lemmata, POS tags, word classes,...? 3) How to estimate P(w i h i ) from the training data? Which Linear smoothing interpolation technique will be used? Weights (Good-Turing, trained Jelinek-Mercer, by EM Katz, Kneser-Ney,...) Generalized Parallel Backoff etc. 14

Language Models design decisions 1) How to factorize P(w 1, w 2,... w m h) into Π i=1..m P(w i h i ), i = w i-n+1,..., w this i-1 i.e. what word-positions will be used as the context h i? work (n-gram-based LMs) 2) What additional context information will be used (apart from word forms), e.g. stems, lemmata, POS tags, word classes,...? 3) How to estimate P(w i h i ) from the training data? Which Linear smoothing interpolation technique will be used? Weights (Good-Turing, trained Jelinek-Mercer, by EM Katz, Kneser-Ney,...) Generalized Parallel Backoff etc. other papers 15

Outline Language Models (LM) basics design decisions Post-ngram LM Dependency LM Evaluation Conclusion & future plans 16

Post-ngram LM In general: P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i h i ) h i context (history) of word w i left-to-right factorization order Bigram LM: h i = w i-1 (one previous word) Trigram LM: h i = w i-2, w i-1 (two previous words) 17

Post-ngram LM In general: P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i h i ) h i context (history) of word w i left-to-right factorization order Bigram LM: h i = w i-1 (one previous word) Trigram LM: h i = w i-2, w i-1 (two previous words) right-to-left factorization order Post-bigram LM: h i = w i+1 (one following word) Post-trigram LM: h i = w i+1, w i+2 (two following words) 18

Post-ngram LM In general: P(s) = P(w 1, w 2,... w m ) Π i=1..m P(w i h i ) h i context (history) of word w i left-to-right factorization order Bigram LM: h i = w i-1 (one previous word) Trigram LM: h i = w i-2, w i-1 (two previous words) right-to-left factorization order Post-bigram LM: h i = w i+1 (one following word) P( The dog barked again ) = P( again NONE ) P( barked again ) P( dog barked ) P( The dog ) 19

Outline Language Models (LM) basics design decisions Post-ngram LM Dependency LM Evaluation Conclusion & future plans 20

Dependency LM exploit the topology of dependency trees The dog barked again 21

Dependency LM exploit the topology of dependency trees barked MALT parser dog again The 22

Dependency LM exploit the topology of dependency trees barked dog again The P( The dog barked again ) = P( The dog ) P( dog barked ) P( barked NONE ) P( again barked ) h i = parent( w i ) 23

Dependency LM Long distance dependencies The dog I heard last night barked again barked dog again The heard I night last 24

Dependency LM Motivation for usage How can we know the dependency structure without knowing the word-forms? 25

Dependency LM Motivation for usage How can we know the dependency structure without knowing the word-forms? For example in tree-to-tree machine translation. štěkal TRANSFER barked pes znovu dog again Ten The ANALYSIS SYNTHESIS Ten pes štěkal znovu The dog barked again 26

Dependency LM Examples Model wp word form of parent P( The dog barked again ) = P( The dog ) P( dog barked ) P( barked NONE ) P( again barked ) barked dog again The 27

Dependency LM Examples Model wp,wg word form of parent, word form of grandparent P( The dog barked again ) = P( The dog, barked ) P( dog barked, NONE ) P( barked NONE, NONE ) P( again barked, NONE ) barked dog again The 28

Dependency LM Examples Model E,wp edge direction, word form of parent P( The dog barked again ) = P( The right, dog ) P( dog right, barked ) P( barked left, NONE ) P( again left, barked ) barked dog again The 29

Dependency LM Examples Model C,wp number of children, word form of parent P( The dog barked again ) = P( The 0, dog ) P( dog 1, barked ) P( barked 2, NONE ) P( again 0, barked ) barked dog again The 30

Dependency LM Examples Model N,wp the word is N th child of its parent, word form of parent P( The dog barked again ) = P( The 1, dog ) P( dog 1, barked ) P( barked 1, NONE ) P( again 2, barked ) barked dog again The 31

Dependency LM Examples of additional context information Model tp,wp POS tag of parent, word form of parent P( The dog barked again ) = P( The NN, dog ) P( dog VBD, barked ) P( barked NONE, NONE ) P( again VBD, barked ) barked dog again The 32

Dependency LM Examples of additional context information Model tp,wp POS tag of parent, word form of parent P( The dog barked again ) = P( The NN, dog ) P( dog VBD, barked ) P( barked NONE, NONE ) P( again VBD, barked ) naïve tagger assignes the most frequent tag for a given word barked dog again The 33

Dependency LM Examples of additional context information Model Tp,wp coarse-grained POS tag of parent, word form of parent P( The dog barked again ) = P( The N, dog ) P( dog V, barked ) P( barked x, NONE ) P( again V, barked ) barked dog again The 34

Dependency LM Examples of additional context information Model E,C,wp,N edge direction, # children, word form of parent, word is N th child of its parent P( The dog barked again ) = P( The right, 0, dog, 1) P( dog right, 1, barked, 1) P( barked left, 2, NONE, 1) P( again left, 0, barked, 2) barked dog again The 35

Outline Language Models (LM) basics design decisions Post-ngram LM Dependency LM Evaluation Conclusion & future plans 36

Evaluation Train and test data from CoNLL 2007 shared task 7 languages: Arabic, Catalan, Czech, English (450 000 tokens, 3 % OOV), Hungarian, Italian (75 000 tokens), and Turkish (26 % OOV) Cross-entropy = (1/ T ) Σ i=1.. T log 2 P(w i h i ), measured on the test data T Cross-entropy Perplexity = 2 Lower perplexity ~ better LM Baseline trigram LM 4 experimental settings: PLAIN, TAGS, DEP, DEP+TAGS 37

Evaluation normalized 110,00% perplexity 100,00% 90,00% 80,00% 70,00% w-1,w-2 (BASELINE) w+1,w+2 (PLAIN) T+1,t+1,l+1,w+1,T+2,t+2,l+2,w+2 (TAGS) E,C,wp,N,wg (DEP) E,C,Tp,tp,N,lp,wp,Tg,tg,lg (DEP+TAGS) 60,00% 50,00% 40,00% ar ca cs en hu it tr 38

Conclusion Findings confirmed for all seven languages Improvement over baseline for English Post-trigram better than trigram PLAIN 8 % Post-bigram better than bigram Additional context (POS & lemma) helps TAGS 20 % Dependency structure helps even more DEP 24 % The best perplexity achieved with DEP+TAGS 31 % 39

Future plans Investigate the reason for better post-ngram LM perplexity Extrinsic evaluation Post-ngram LM in speech recognition Dependency LM in tree-to-tree machine translation Better smoothing using Generalized Parallel Backoff Bigger LM for real applications 40

Thank you 41