Understanding Syntactic Generalization in Structure-inducing Language Models
David Arps, Hassan Sajjad, Laura Kallmeyer

TL;DR
This paper evaluates three structure-inducing language models across multiple languages and datasets, revealing differences in their syntactic representations and highlighting the effectiveness of small synthetic-data-trained models for basic property testing.
Contribution
It provides a comprehensive comparison of three SiLM architectures on syntactic and performance metrics, emphasizing the importance of model architecture and training data in syntactic generalization.
Findings
GPST performs most consistently across evaluations.
Models differ significantly in induced syntactic representations.
Small models trained on synthetic data are effective for property evaluation.
Abstract
Structure-inducing Language Models (SiLM) are trained on a self-supervised language modeling task, and induce a hierarchical sentence representation as a byproduct when processing an input. SiLMs couple strong syntactic generalization behavior with competitive performance on various NLP tasks, but many of their basic properties are yet underexplored. In this work, we train three different SiLM architectures from scratch: Structformer (Shen et al., 2021), UDGN (Shen et al., 2022), and GPST (Hu et al., 2024b). We train these architectures on both natural language (English, German, and Chinese) corpora and synthetic bracketing expressions. The models are then evaluated with respect to (i) properties of the induced syntactic representations (ii) performance on grammaticality judgment tasks, and (iii) training dynamics. We find that none of the three architectures dominates across all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
