A Systematic Assessment of Syntactic Generalization in Neural Language   Models

Jennifer Hu; Jon Gauthier; Peng Qian; Ethan Wilcox; Roger P. Levy

arXiv:2005.03692·cs.CL·May 26, 2020·26 cites

A Systematic Assessment of Syntactic Generalization in Neural Language Models

Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, Roger P. Levy

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates how different neural language model architectures and training data sizes affect their ability to learn human-like syntax, revealing architecture impacts more than data size and a disconnect between perplexity and syntactic understanding.

Contribution

It provides a comprehensive analysis of syntactic generalization across multiple model architectures and data sizes, highlighting architecture's dominant role over dataset size.

Findings

01

Sequential models underperform other architectures in syntax tasks.

02

Model architecture influences syntactic generalization more than training data size.

03

Perplexity does not reliably indicate syntactic understanding.

Abstract

While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic knowledge. Furthermore, existing work has not provided a clear picture about the model properties required to produce proper syntactic generalizations. We present a systematic evaluation of the syntactic knowledge of neural language models, testing 20 combinations of model types and data sizes on a set of 34 English-language syntactic test suites. We find substantial differences in syntactic generalization performance by model architecture, with sequential models underperforming other architectures. Factorially manipulating model architecture and training dataset size (1M--40M words), we find that variability in syntactic generalization performance is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cpllab/syntactic-generalization
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax