What do RNN Language Models Learn about Filler-Gap Dependencies?

Ethan Wilcox; Roger Levy; Takashi Morita; Richard Futrell

arXiv:1809.00042·cs.CL·September 5, 2018

What do RNN Language Models Learn about Filler-Gap Dependencies?

Ethan Wilcox, Roger Levy, Takashi Morita, Richard Futrell

PDF

TL;DR

This paper investigates whether RNN language models can learn and represent long-distance filler-gap dependencies and their constraints, revealing their capacity to encode complex syntactic relationships.

Contribution

It demonstrates that RNN language models can learn and generalize about filler-gap dependencies and island constraints, a novel insight into their syntactic understanding.

Findings

01

RNNs can represent filler-gap relationships across large text spans.

02

RNNs show evidence of learning island constraints like wh-islands and adjunct islands.

03

RNNs generalize about empty syntactic positions.

Abstract

RNN language models have achieved state-of-the-art perplexity results and have proven useful in a suite of NLP tasks, but it is as yet unclear what syntactic generalizations they learn. Here we investigate whether state-of-the-art RNN language models represent long-distance filler-gap dependencies and constraints on them. Examining RNN behavior on experimentally controlled sentences designed to expose filler-gap dependencies, we show that RNNs can represent the relationship in multiple syntactic positions and over large spans of text. Furthermore, we show that RNNs learn a subset of the known restrictions on filler-gap dependencies, known as island constraints: RNNs show evidence for wh-islands, adjunct islands, and complex NP islands. These studies demonstrates that state-of-the-art RNN models are able to learn and generalize about empty syntactic positions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.