Composition, Attention, or Both?
Ryo Yoshida, Yohei Oseki

TL;DR
This paper introduces Composition Attention Grammars (CAGs), a new architecture combining recursive composition and self-attention to improve syntactic generalization in language models, making them more human-like.
Contribution
It presents a novel architecture that integrates composition functions with self-attention, demonstrating their combined effect on syntactic generalization in language models.
Findings
Both components improve human-like syntactic generalization.
Composition function helps propagate syntactic features.
Models with these components outperform baselines on SyntaxGym.
Abstract
In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
