Structural Supervision Improves Few-Shot Learning and Syntactic Generalization in Neural Language Models
Ethan Wilcox, Peng Qian, Richard Futrell, Ryosuke Kohita, Roger Levy, and Miguel Ballesteros

TL;DR
This paper investigates how structural supervision enhances neural language models' ability to learn and generalize syntactic properties from minimal data, demonstrating improved few-shot learning and invariance in English.
Contribution
The study shows that structural supervision significantly improves neural models' syntactic generalization and transfer capabilities in few-shot learning scenarios.
Findings
Neural models can learn syntactic generalizations from as few as two examples.
Structurally supervised models outperform standard LSTMs in generalization accuracy.
Models exhibit invariance properties, transferring syntactic knowledge across contexts.
Abstract
Humans can learn structural properties about a word from minimal experience, and deploy their learned syntactic representations uniformly in different grammatical contexts. We assess the ability of modern neural language models to reproduce this behavior in English and evaluate the effect of structural supervision on learning outcomes. First, we assess few-shot learning capabilities by developing controlled experiments that probe models' syntactic nominal number and verbal argument structure generalizations for tokens seen as few as two times during training. Second, we assess invariance properties of learned representation: the ability of a model to transfer syntactic generalizations from a base context (e.g., a simple declarative active-voice sentence) to a transformed context (e.g., an interrogative sentence). We test four models trained on the same dataset: an n-gram baseline, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory
