An explainable transformer circuit for compositional generalization

Cheng Tang; Brenden Lake; Mehrdad Jazayeri

arXiv:2502.15801·cs.LG·February 25, 2025

An explainable transformer circuit for compositional generalization

Cheng Tang, Brenden Lake, Mehrdad Jazayeri

PDF

Open Access

TL;DR

This paper uncovers and interprets the specific circuit within a transformer responsible for compositional generalization, enabling better understanding and control of the model's behavior.

Contribution

It identifies and mechanistically interprets the circuit for compositional induction in a compact transformer, advancing interpretability and controllability.

Findings

01

Validated the circuit through causal ablations.

02

Formalized the circuit with a program-like description.

03

Enabled precise activation edits to steer behavior.

Abstract

Compositional generalization-the systematic combination of known components into novel structures-remains a core challenge in cognitive science and machine learning. Although transformer-based large language models can exhibit strong performance on certain compositional tasks, the underlying mechanisms driving these abilities remain opaque, calling into question their interpretability. In this work, we identify and mechanistically interpret the circuit responsible for compositional induction in a compact transformer. Using causal ablations, we validate the circuit and formalize its operation using a program-like description. We further demonstrate that this mechanistic understanding enables precise activation edits to steer the model's behavior predictably. Our findings advance the understanding of complex behaviors in transformers and highlight such insights can provide a direct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Natural Language Processing Techniques