Complexity Control Facilitates Reasoning-Based Compositional   Generalization in Transformers

Zhongwang Zhang; Pengxiao Lin; Zhiwei Wang; Yaoyu Zhang; Zhi-Qin John; Xu

arXiv:2501.08537·cs.CL·January 16, 2025

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John, Xu

PDF

Open Access 1 Repo

TL;DR

This paper shows that controlling the complexity of transformers' internal mechanisms encourages reasoning-based generalization over memorization, improving their performance on compositional tasks across various domains.

Contribution

It introduces complexity control strategies and masking techniques to steer transformers toward reasoning-based solutions, revealing internal mechanisms linked to better generalization.

Findings

01

Complexity control influences solution type in transformers.

02

Reasoning solutions exhibit lower complexity bias.

03

Validated across multiple real-world datasets.

Abstract

Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies significantly influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions). By applying masking strategies to the model's information circuits and employing multiple complexity metrics, we reveal distinct internal working mechanisms associated with different solution types. Further analysis reveals that reasoning-based solutions exhibit a lower complexity bias, which aligns with the well-studied neuron condensation phenomenon. This lower complexity bias is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sjtuzzw/complexity_control
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications