A Systematic Assessment of Syntactic Generalization in Neural Language Models
Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, Roger P. Levy

TL;DR
This paper systematically evaluates how different neural language model architectures and training data sizes affect their ability to learn human-like syntax, revealing architecture impacts more than data size and a disconnect between perplexity and syntactic understanding.
Contribution
It provides a comprehensive analysis of syntactic generalization across multiple model architectures and data sizes, highlighting architecture's dominant role over dataset size.
Findings
Sequential models underperform other architectures in syntax tasks.
Model architecture influences syntactic generalization more than training data size.
Perplexity does not reliably indicate syntactic understanding.
Abstract
While state-of-the-art neural network models continue to achieve lower perplexity scores on language modeling benchmarks, it remains unknown whether optimizing for broad-coverage predictive performance leads to human-like syntactic knowledge. Furthermore, existing work has not provided a clear picture about the model properties required to produce proper syntactic generalizations. We present a systematic evaluation of the syntactic knowledge of neural language models, testing 20 combinations of model types and data sizes on a set of 34 English-language syntactic test suites. We find substantial differences in syntactic generalization performance by model architecture, with sequential models underperforming other architectures. Factorially manipulating model architecture and training dataset size (1M--40M words), we find that variability in syntactic generalization performance is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
