A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step   Reasoning Task

Jannik Brinkmann; Abhay Sheshadri; Victor Levoso; Paul Swoboda,; Christian Bartelt

arXiv:2402.11917·cs.LG·July 2, 2024·1 cites

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda,, Christian Bartelt

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper provides a detailed mechanistic analysis of a transformer trained on a synthetic multi-step reasoning task, revealing interpretable internal processes and mechanisms that facilitate reasoning.

Contribution

It introduces a comprehensive mechanistic analysis method and identifies depth-bounded recurrent mechanisms in transformers trained on reasoning tasks.

Findings

01

Transformers use depth-bounded recurrent mechanisms.

02

Intermediate results are stored in specific token positions.

03

Identified motifs may generalize to understanding complex models.

Abstract

Transformers demonstrate impressive performance on a range of reasoning benchmarks. To evaluate the degree to which these abilities are a result of actual reasoning, existing work has focused on developing sophisticated benchmarks for behavioral studies. However, these studies do not provide insights into the internal mechanisms driving the observed capabilities. To improve our understanding of the internal mechanisms of transformers, we present a comprehensive mechanistic analysis of a transformer trained on a synthetic reasoning task. We identify a set of interpretable mechanisms the model uses to solve the task, and validate our findings using correlational and causal evidence. Our results suggest that it implements a depth-bounded recurrent mechanisms that operates in parallel and stores intermediate results in selected token positions. We anticipate that the motifs we identified in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abhay-sheshadri/backward-chaining-circuits
pytorchOfficial

Videos

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task· underline

Taxonomy

TopicsEvolutionary Algorithms and Applications

MethodsSparse Evolutionary Training