Composition, Attention, or Both?

Ryo Yoshida; Yohei Oseki

arXiv:2210.12958·cs.CL·August 20, 2025

Composition, Attention, or Both?

Ryo Yoshida, Yohei Oseki

PDF

Open Access 1 Repo

TL;DR

This paper introduces Composition Attention Grammars (CAGs), a new architecture combining recursive composition and self-attention to improve syntactic generalization in language models, making them more human-like.

Contribution

It presents a novel architecture that integrates composition functions with self-attention, demonstrating their combined effect on syntactic generalization in language models.

Findings

01

Both components improve human-like syntactic generalization.

02

Composition function helps propagate syntactic features.

03

Models with these components outperform baselines on SyntaxGym.

Abstract

In this paper, we propose a novel architecture called Composition Attention Grammars (CAGs) that recursively compose subtrees into a single vector representation with a composition function, and selectively attend to previous structural information with a self-attention mechanism. We investigate whether these components -- the composition function and the self-attention mechanism -- can both induce human-like syntactic generalization. Specifically, we train language models (LMs) with and without these two components with the model sizes carefully controlled, and evaluate their syntactic generalization performance against six test circuits on the SyntaxGym benchmark. The results demonstrated that the composition function and the self-attention mechanism both play an important role to make LMs more human-like, and closer inspection of linguistic phenomenon implied that the composition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

osekilab/cag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsTest