Sneaking Syntax into Transformer Language Models with Tree   Regularization

Ananjan Nandi; Christopher D. Manning; Shikhar Murty

arXiv:2411.18885·cs.CL·March 25, 2025

Sneaking Syntax into Transformer Language Models with Tree Regularization

Ananjan Nandi, Christopher D. Manning, Shikhar Murty

PDF

Open Access 1 Video

TL;DR

This paper introduces TreeReg, a regularizer that injects syntactic structure into transformer language models, improving their syntactic understanding and out-of-distribution performance without altering model architecture.

Contribution

The work presents a novel differentiable regularizer, TreeReg, that softly encodes syntactic tree information into transformers, enhancing their linguistic generalization capabilities.

Findings

01

Up to 10% lower perplexity on out-of-distribution data.

02

Up to 9.5 point improvement in syntactic generalization.

03

Mitigates performance degradation on adversarial NLI benchmarks by 41.2 points.

Abstract

While compositional accounts of human language understanding are based on a hierarchical tree-like process, neural models like transformers lack a direct inductive bias for such tree structures. Introducing syntactic inductive biases could unlock more robust and data-efficient learning in transformer language models (LMs), but existing methods for incorporating such structure greatly restrict models, either limiting their expressivity or increasing inference complexity. This work instead aims to softly inject syntactic inductive biases into given transformer circuits, through a structured regularizer. We introduce TreeReg, an auxiliary loss function that converts bracketing decisions from silver parses into a set of differentiable orthogonality constraints on vector hidden states. TreeReg integrates seamlessly with the standard LM objective, requiring no architectural changes. LMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sneaking Syntax into Transformer Language Models with Tree Regularization· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis

MethodsSparse Evolutionary Training · LLaMA