Interpreting Affine Recurrence Learning in GPT-style Transformers

Samarth Bhargav; Alexander Gu

arXiv:2410.17438·cs.LG·October 24, 2024

Interpreting Affine Recurrence Learning in GPT-style Transformers

Samarth Bhargav, Alexander Gu

PDF

Open Access

TL;DR

This paper investigates how GPT-style transformers internally perform affine recurrence learning, revealing a layered copying and refinement mechanism that enhances understanding of in-context learning and recursive task processing.

Contribution

It introduces a mechanistic interpretability analysis of transformers trained on affine recurrences, uncovering a layered process of copying and refinement within the model.

Findings

01

Initial sequence estimates are made via copying in the zeroth layer.

02

Refinement occurs through negative similarity heads in the second layer.

03

Insights contribute to understanding recursive tasks in transformers.

Abstract

Understanding the internal mechanisms of GPT-style transformers, particularly their capacity to perform in-context learning (ICL), is critical for advancing AI alignment and interpretability. In-context learning allows transformers to generalize during inference without modifying their weights, yet the precise operations driving this capability remain largely opaque. This paper presents an investigation into the mechanistic interpretability of these transformers, focusing specifically on their ability to learn and predict affine recurrences as an ICL task. To address this, we trained a custom three-layer transformer to predict affine recurrences and analyzed the model's internal operations using both empirical and theoretical approaches. Our findings reveal that the model forms an initial estimate of the target sequence using a copying mechanism in the zeroth layer, which is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification