Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Tal Linzen; Emmanuel Dupoux; Yoav Goldberg

arXiv:1611.01368·cs.CL·November 7, 2016·27 cites

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Tal Linzen, Emmanuel Dupoux, Yoav Goldberg

PDF

Open Access 5 Repos

TL;DR

This paper investigates whether LSTM neural networks can learn syntax-sensitive dependencies like subject-verb agreement, finding they perform well with supervision but struggle with purely language modeling signals, indicating a need for stronger architectures or supervision.

Contribution

The study demonstrates that LSTMs can learn syntax-sensitive dependencies with explicit supervision but are limited in purely language modeling contexts, highlighting the importance of targeted training.

Findings

01

LSTMs achieve high accuracy with explicit grammatical supervision.

02

Errors increase when sequential and structural cues conflict.

03

Language modeling alone is insufficient for capturing syntax-sensitive dependencies.

Abstract

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture's grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory