Recurrent neural network grammars. Slide credits: Chris Dyer, Adhiguna Kuncoro

Similar documents
IMAGE CAPTIONING USING PHRASE-BASED HIERARCHICAL LSTM MODEL

Dynamic Programming for Linear Time Incremental Parsing

The online processing of semantic and pragmatic content

Genera&on of Image Descrip&ons. Tambet Ma&isen

Moving toward formalisation COMP62342

Machine Learning.! A completely different way to have an. agent acquire the appropriate abilities to solve a particular goal is via machine learning.

Reasoning with Neural Networks

On Deriving Aspectual Sense

Development of intelligent systems (RInS) Object recognition with Convolutional Neural Networks

Natural Language Processing (NLP)

Semantics. These slides were produced by Hadas Kotek.

Week 42: Siamese Network: Architecture and Applications in Visual Object Tracking. Yuanwei Wu

What kind of Theory do we need for English Syntax? Are languages finite? Could we list all the sentences of English?

Bayesian Analysis of Population Mixture and Admixture

An Introduction to Formal Logic

CS6501: Deep Learning for Visual Recognition. CNN Architectures

Logical Forms. Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER February 16, 2017

Nathan A. Thompson, Ph.D. Adjunct Faculty, University of Cincinnati Vice President, Assessment Systems Corporation

Moving towards formalisation COMP62342

Design of 32 bit Parallel Prefix Adders

Multiclass and Multi-label Classification

Course # Course Name Credits

Lecture 1: Turtle Graphics. the turtle and the crane and the swallow observe the time of their coming; Jeremiah 8:7

Using Physics for Motion Retargeting

Functions Introduction to Functions 7.2 One-to-One, Onto, Inverse functions. mjarrar Watch this lecture and download the slides

Contents. Resources. Download these resources in your computer/laptop from

Mr. Bouchard Summer Assignment AP Biology. Name: Block: Score: / 20. Topic: Chemistry Review and Evolution Intro Packet Due: 9/4/18

[Boston March for Science 2017 photo Hendrik Strobelt]

Eating Your Own Dog Food

Controllability of Complex Networks. Yang-Yu Liu, Jean-Jacques Slotine, Albert-Laszlo Barbasi Presented By Arindam Bhattacharya

Application of Fuzzy Logic in Automated Cow Status Monitoring

Physics Based Ragdoll Animation

NATURAL SELECTION SIMULATION

Prof Michael O Neill Introduction to Evolutionary Computation

Reversing Category Exclusivities in Infant Perceptual Categorization: Simulations and Data

Superlative Quantifiers as Meta Speech Acts

SEDAR31-DW30: Shrimp Fishery Bycatch Estimates for Gulf of Mexico Red Snapper, Brian Linton SEDAR-PW6-RD17. 1 May 2014

Veterinary Medical Terminology

Answers to Questions about Smarter Balanced 2017 Test Results. March 27, 2018

ENGINEERING TEST SPECIFICATION

Mammalogy Lecture 8 - Evolution of Ear Ossicles

MOON PHASES FOR 2018, at Kitt Peak Times and dates are given in local time, zone = 7 hr West. They are generally better than +- 2 minutes.

MOON PHASES FOR 2019, at Kitt Peak Times and dates are given in local time, zone = 7 hr West. They are generally better than +- 2 minutes.

The City School. Learn Create Program

Subdomain Entry Vocabulary Modules Evaluation

X-bar Theory Motivating intermediate projections

EVM analysis of an Interference Limited SIMO-SC System With Independent and Correlated Channels

Goal: To learn about the advantages and disadvantages of variations, by simulating birds with different types of beaks competing for various foods.

Simulation of the ASFA system in an ERTMS simulator

Design of High Speed Vedic Multiplier Using Carry Select Adder with Brent Kung Adder

Handling missing data in matched case-control studies using multiple imputation

X-bar Node Flavors Introduction to syntax. Noun Phrase

Virtual Genetics Lab (VGL)

MODELING THE CAUSES OF LEG DISORDERS IN FINISHER HERDS

A Column Generation Algorithm to Solve a Synchronized Log-Truck Scheduling Problem

Design of Modified Low Power and High Speed Carry Select Adder Using Brent Kung Adder

Human Uniqueness. Human Uniqueness. Why are we so different? 12/6/2017. Four Candidates

LABORATORY EXERCISE 7: CLADISTICS I

Informed search algorithms

Pierre-Louis Toutain, Ecole Nationale Vétérinaire National veterinary School of Toulouse, France Wuhan 12/10/2015

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Rule-Based. Introduction to Data Mining, 2 nd Edition

Chapter VII Non-linear SSI analysis of Structure-Isolated footings -soil system

Lecture 11 Wednesday, September 19, 2012

June 2009 (website); September 2009 (Update) consent, informed consent, owner consent, risk, prognosis, communication, documentation, treatment

Activity 1: Changes in beak size populations in low precipitation

Chapter 6: Extending Theory

An Esterel Virtual Machine (EVM) Aruchunan Vaseekaran

INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET)

Mathematical models for dog rabies that include the curtailing effect of human intervention

Cow Exercise 1 Answer Key

Design of Low Power and High Speed Carry Select Adder Using Brent Kung Adder

Page # Events Page # Previous Event/Explanation 4 Kitten tried to lick the moon and she got a bug on her tongue. milk.

INFO 1103 Homework Project 1

Comparison of Parallel Prefix Adders Performance in an FPGA

Purpose and focus of the module: Poultry Definition Domestication Classification. Basic Anatomy & Physiology

Entailment above the word level in distributional semantics

Turtle Ballet: Simulating Parallel Turtles in a Nonparallel LOGO Version. Erich Neuwirth

Level 5. Book a. Level 5. Word Count 98 Text Type Narrative High Frequency Word/s Introduced. The Thirsty Cats. Working Dogs.

Improving consumer protection against zoonotic diseases Phase II Project No: EuropeAid/133990/C/SER/AL

We recommend you cite the published version. The publisher s URL is

Veterinarian Assistant

Chapter 6: Extending Theory

Development of the New Zealand strategy for local eradication of tuberculosis from wildlife and livestock

TREAT Steward. Antimicrobial Stewardship software with personalized decision support

The Kaggle Competitions: An Introduction to CAMCOS Fall 2015

Ch 1.2 Determining How Species Are Related.notebook February 06, 2018

Introduction to Python Dictionaries

Accounting for the causal link between free adjuncts and their host clauses

CONTINUING EDUCATION AND INCORPORATION OF THE ONE HEALTH CONCEPT

BioSci 110, Fall 08 Exam 2

Genetics Lab #4: Review of Mendelian Genetics

From James Merrill's Dogs to the. Alfred A. Knopf Borzoi Devices. Jack W. C. Hagstrom

Lab 7. Evolution Lab. Name: General Introduction:

Heuristic search, A* CS171, Winter 2018 Introduction to Artificial Intelligence Prof. Richard Lathrop. Reading: R&N

[EMC Publishing Note: In this document: CAT 1 stands for the C est à toi! Level One Second Edition Teacher s Annotated Edition of the Textbook.

Modeling and Control of Trawl Systems

Implementation and Estimation of Delay, Power and Area for Parallel Prefix Adders

Performance Analysis of HOM in LTE Small Cell

ST NICHOLAS COLLEGE HALF YEARLY PRIMARY EXAMINATIONS. February YEAR 4 ENGLISH TIME: 1hr 15 min (Reading Comprehension, Language, and Writing)

10015NAT Graduate Diploma Veterinary Acupuncture

Transcription:

Recurrent neural network grammars Slide credits: Chris Dyer, Adhiguna Kuncoro

Widespread phenomenon: Polarity items can only appear in certain contexts Example: anybody is a polarity item that tends to appear only in specific contexts: The lecture that I gave did not appeal to anybody but not: * The lecture that I gave appealed to anybody We might infer that the licensing context is the word not appearing among the preceding words, and you could use an RNN to model this. However: * The lecture that I did not give appealed to anybody

Language is hierarchical The licensing context depends on recursive structure (syntax) The lecture did not that I gave appeal to anybody The lecture appealed to anybody that I did not give

One theory of hierarchy Generate symbols sequentially using an RNN Add some control symbols to rewrite the history periodically Periodically compress a sequence into a single constituent Augment RNN with an operation to compress recent history into a single vector (-> reduce ) RNN predicts next symbol based on the history of compressed elements and non-compressed terminals ( push ) RNN must also predict control symbols that decide how big constituents are We call such models recurrent neural network grammars.

One theory of hierarchy Generate symbols sequentially using an RNN Add some control symbols to rewrite the history periodically Periodically compress a sequence into a single constituent Augment RNN with an operation to compress recent history into a single vector (-> reduce ) RNN predicts next symbol based on the history of compressed elements and non-compressed terminals ( shift or generate ) RNN must also predict control symbols that decide how big constituents are We call such models recurrent neural network grammars.

One theory of hierarchy Generate symbols sequentially using an RNN Add some control symbols to rewrite the history periodically Periodically compress a sequence into a single constituent Augment RNN with an operation to compress recent history into a single vector (-> reduce ) RNN predicts next symbol based on the history of compressed elements and non-compressed terminals ( shift or generate ) RNN must also predict control symbols that decide how big constituents are We call such models recurrent neural network grammars.

(Ordered) tree traversals are sequences S NP VP The hungry cat meows.

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

(Ordered) tree traversals are sequences S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). )

Terminals Stack Action

Terminals Stack Action NT(S)

Terminals Stack Action NT(S) (S NT(NP)

Terminals Stack Action NT(S) (S (NP (S NT(NP)

Terminals Stack Action NT(S) (S NT(NP) (S (NP GEN(The)

Terminals Stack Action NT(S) The (S NT(NP) (S (NP GEN(The) (S (NP The

Terminals Stack Action NT(S) The (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry)

Terminals Stack Action NT(S) The The hungry (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry

Terminals Stack Action NT(S) The The hungry (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat)

Terminals Stack Action NT(S) The The hungry The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat

Terminals Stack Action NT(S) The The hungry The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE

Terminals Stack Action NT(S) (S NT(NP) (S (NP GEN(The) The (S (NP The GEN(hungry) The hungry (S (NP The hungry GEN(cat) The hungry cat (S (NP The hungry cat REDUCE The hungry cat (S (NP The hungry cat )

Terminals Stack Action NT(S) (S NT(NP) (S (NP GEN(The) The The hungry The hungry cat (S (NP The (S (NP The hungry (S (NP The hungry cat GEN(hungry) GEN(cat) REDUCE The hungry cat (S (NP The hungry cat ) (S (NP The hungry cat) Compress The hungry cat into a single composite symbol

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat)

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP)

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP) (S (NP The hungry cat) (VP

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP) The hungry cat (S (NP The hungry cat) (VP??? Q: What information can we use to predict the next action, and how can we encode it with an RNN?

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP) (S (NP The hungry cat) (VP A: We can use an RNN for each of: 1. Previous terminal symbols 2. Previous actions 3. Current stack contents

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP) The hungry cat (S (NP The hungry cat) (VP GEN(meows)

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat (S NT(NP) (S (NP GEN(The) (S (NP The GEN(hungry) (S (NP The hungry GEN(cat) (S (NP The hungry cat REDUCE (S (NP The hungry cat) NT(VP) The hungry cat The hungry cat meows (S (NP The hungry cat) (VP (S (NP The hungry cat) (VP meows GEN(meows) REDUCE The hungry cat meows (S (NP The hungry cat) (VP meows) GEN(.) The hungry cat meows. The hungry cat meows. (S (NP The hungry cat) (VP meows). (S (NP The hungry cat) (VP meows).) REDUCE

Terminals Stack Action NT(S) The The hungry The hungry cat The hungry cat The hungry cat The hungry cat meows The hungry cat meows (S (NP The (S (NP The hungry (S (NP The hungry cat (S (NP The hungry cat) Final stack symbol is (a vector representation of) the complete tree. (S (NP The hungry cat) (VP (S (NP The hungry cat) (VP meows (S NT(NP) (S (NP GEN(The) GEN(hungry) GEN(cat) REDUCE NT(VP) GEN(meows) REDUCE (S (NP The hungry cat) (VP meows) GEN(.) The hungry cat meows. The hungry cat meows. (S (NP The hungry cat) (VP meows). (S (NP The hungry cat) (VP meows).) REDUCE

Syntactic Composition Need representation for: (NP The hungry cat)

Syntactic Composition Need representation for: (NP The hungry cat) What head type? NP The

Syntactic Composition Need representation for: (NP The hungry cat) What head type? NP The hungry

Syntactic Composition Need representation for: (NP The hungry cat) What head type? NP The hungry cat

Syntactic Composition Need representation for: (NP The hungry cat) What head type? NP The hungry cat )

Syntactic Composition Need representation for: (NP The hungry cat) NP The hungry cat )

Syntactic Composition Need representation for: (NP The hungry cat) NP The hungry cat ) NP

Syntactic Composition Need representation for: (NP The hungry cat) NP The hungry cat ) NP

Syntactic Composition Need representation for: (NP The hungry cat) NP The hungry cat ) NP

Syntactic Composition Need representation for: (NP The hungry cat) NP The hungry cat ) NP

Syntactic Composition Need representation for: (NP The hungry cat) ( NP The hungry cat ) NP

Recursion Need representation for: (NP The hungry cat) (NP The (ADJP very hungry) cat) ( The hungry NP cat ) NP

Recursion Need representation for: (NP The hungry cat) (NP The (ADJP very hungry) cat) {z } v ( v NP The cat ) NP

Recursion Need representation for: (NP The hungry cat) (NP The (ADJP very hungry) cat) {z } v ( v NP The cat ) NP

Stack symbols composed recursively mirror corresponding tree structure S NP VP The hungry cat meows. The hungry cat meows.

Stack symbols composed recursively mirror corresponding tree structure S NP VP The hungry cat meows. The hungry cat meows. NP

Stack symbols composed recursively mirror corresponding tree structure S NP VP The hungry cat meows. The hungry cat meows. NP VP

Stack symbols composed recursively mirror corresponding tree structure S Effect Stack encodes NP VP top-down syntactic recency, rather The hungry cat meows. than left-to-right string recency The hungry cat meows. NP VP S

Implementing RNNGs Stack RNNs Augment a sequential RNN with a stack pointer Two constant-time operations push - read input, add to top of stack, connect to current location of the stack pointer pop - move stack pointer to its parent A summary of stack contents is obtained by accessing the output of the RNN at location of the stack pointer Note: push and pop are discrete actions here (cf. Grefenstette et al., 2015)

Implementing RNNGs Stack RNNs y 0 PUSH ;

Implementing RNNGs Stack RNNs y 0 y 1 POP ; x 1

Implementing RNNGs Stack RNNs y 0 y 1 ; x 1

Implementing RNNGs Stack RNNs y 0 y 1 PUSH ; x 1

Implementing RNNGs Stack RNNs y 0 y 1 y 2 POP ; x 1 x 2

Implementing RNNGs Stack RNNs y 0 y 1 y 2 ; x 1 x 2

Implementing RNNGs Stack RNNs y 0 y 1 y 2 PUSH ; x 1 x 2

Implementing RNNGs Stack RNNs y 0 y 1 y 2 y 3 ; x 1 x 2 x 3

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) stack top

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) stack S top

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP top stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP top stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) top NP stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) top NP stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP top stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP VP top stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) top NP VP stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP VP top stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) top NP VP stack S

The evolution of the stack LSTM over time mirrors tree structure S NP VP The hungry cat meows. S( NP( The hungry cat ) VP( meows ). ) NP VP stack S top

Each word is conditioned on history represented by a trio of RNNs S NP VP The hungry cat meows. p(meows history) S( NP( The hungry cat ) VP( meows ). ) NP VP stack S

Train with backpropagation through In training, backpropagate through these three RNNs) structure S NP VP The hungry cat meows. This network is dynamic. Don t derive gradients by hand that s error prone. Use automatic differentiation instead And recursively through this structure. stack S( NP( The hungry cat ) VP( meows ). ) NP VP S

Complete model sentence Sequence of actions (completely defines x and y) Actions up to time t tree bias allowable actions at this step action embedding history embedding

Complete model sentence Sequence of actions (completely defines x and y) Actions up to time t tree bias Model is dynamic: variable number of context-dependent actions at each step allowable actions at this step action embedding history embedding

Complete model stack output (buffer) action history

Complete model stack output (buffer) action history

Implementing RNNGs Parameter Estimation RNNGs jointly model sequences of words together with a tree structure, p (x, y) Any parse tree can be converted to a sequence of actions (depth first traversal) and vice versa (subject to wellformedness constraints) We use trees from the Penn Treebank We could treat the non-generation actions as latent variables or learn them with RL, effectively making this a problem of grammar induction. Future work

Implementing RNNGs Inference An RNNG is a joint distribution p(x,y) over strings (x) and parse trees (y) We are interested in two inference questions: What is p(x) for a given x? [language modeling] What is max p(y x) for a given x? [parsing] y Unfortunately, the dynamic programming algorithms we often rely on are of no help here We can use importance sampling to do both by sampling from a discriminatively trained model

English PTB (Parsing) Type F1 Petrov and Klein (2007) Shindo et al (2012) Single model Shindo et al (2012) Ensemble Vinyals et al (2015) PTB only Vinyals et al (2015) Ensemble G 90.1 G 91.1 ~G 92.4 D 90.5 S 92.8 Discriminative D 89.8 Generative (IS) G 92.4

Importance Sampling Assume we ve got a conditional distribution q(y x) s.t. (i) p(x, y) > 0 =) q(y x) > 0 (ii) y q(y x) is tractable and (iii) q(y x) is tractable

Importance Sampling Assume we ve got a conditional distribution q(y x) s.t. (i) p(x, y) > 0 =) q(y x) > 0 (ii) y q(y x) is tractable and (iii) q(y x) is tractable Let the importance weights w(x, y) = p(x, y) q(y x)

Importance Sampling Assume we ve got a conditional distribution q(y x) s.t. (i) p(x, y) > 0 =) q(y x) > 0 (ii) y q(y x) is tractable and (iii) q(y x) is tractable Let the importance weights w(x, y) = p(x, y) q(y x) p(x) = X p(x, y) = X w(x, y)q(y x) y2y(x) y2y(x) = E y q(y x) w(x, y)

Importance Sampling p(x) = X p(x, y) = X w(x, y)q(y x) y2y(x) y2y(x) = E y q(y x) w(x, y)

Importance Sampling p(x) = X p(x, y) = X w(x, y)q(y x) y2y(x) y2y(x) = E y q(y x) w(x, y) Replace this expectation with its Monte Carlo estimate. y (i) q(y x) for i 2 {1, 2,...,N}

Importance Sampling p(x) = X p(x, y) = X w(x, y)q(y x) y2y(x) y2y(x) = E y q(y x) w(x, y) Replace this expectation with its Monte Carlo estimate. y (i) q(y x) for i 2 {1, 2,...,N} E q(y x) w(x, y) MC 1 N NX w(x, y (i) ) i=1

English PTB (LM) Perplexity 5-gram IKN 169.3 LSTM + Dropout 113.4 Generative (IS) 102.4 Chinese CTB (LM) Perplexity 5-gram IKN 255.2 LSTM + Dropout 207.3 Generative (IS) 171.9

Do we need a stack? Kuncoro et al., Oct 2017 Both stack and action history encode the same information, but expose it to the classifier in different ways. Leaving out stack is harmful; using it on its own works slightly better than complete model!

RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?

RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?

RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?

RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?

RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?

Summary Language is hierarchical, and this inductive bias can be encoded into an RNN-style model. RNNGs work by simulating a tree traversal like a pushdown automaton, but with continuous rather than finite history. Modeled by RNNs encoding (1) previous tokens, (2) previous actions, and (3) stack contents. A stack LSTM evolves with stack contents. The final representation computed by a stack LSTM has a topdown recency bias, rather than left-to-right bias, which might be useful in modeling sentences. Effective for parsing and language modeling, and seems to capture linguistic intuitions about headedness.