Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi, Christopher D. Manning, Shikhar Murty

TL;DR
This paper introduces TreeReg, a regularizer that injects syntactic structure into transformer language models, improving their syntactic understanding and out-of-distribution performance without altering model architecture.
Contribution
The work presents a novel differentiable regularizer, TreeReg, that softly encodes syntactic tree information into transformers, enhancing their linguistic generalization capabilities.
Findings
Up to 10% lower perplexity on out-of-distribution data.
Up to 9.5 point improvement in syntactic generalization.
Mitigates performance degradation on adversarial NLI benchmarks by 41.2 points.
Abstract
While compositional accounts of human language understanding are based on a hierarchical tree-like process, neural models like transformers lack a direct inductive bias for such tree structures. Introducing syntactic inductive biases could unlock more robust and data-efficient learning in transformer language models (LMs), but existing methods for incorporating such structure greatly restrict models, either limiting their expressivity or increasing inference complexity. This work instead aims to softly inject syntactic inductive biases into given transformer circuits, through a structured regularizer. We introduce TreeReg, an auxiliary loss function that converts bracketing decisions from silver parses into a set of differentiable orthogonality constraints on vector hidden states. TreeReg integrates seamlessly with the standard LM objective, requiring no architectural changes. LMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsSparse Evolutionary Training · LLaMA
