Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key   to Model Reasoning

Lin Zhang; Lijie Hu; Di Wang

arXiv:2502.09022·cs.AI·February 17, 2025

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning

Lin Zhang, Lijie Hu, Di Wang

PDF

Open Access

TL;DR

This paper investigates the internal reasoning mechanisms of GPT-2 using circuit analysis and self-influence functions, revealing human-interpretable multi-step reasoning paths in language models.

Contribution

It introduces a novel approach combining circuit analysis and self-influence to interpret multi-step reasoning in transformer models.

Findings

01

GPT-2's reasoning process can be mapped to human-interpretable circuits.

02

Self-influence functions highlight token importance changes during reasoning.

03

The methodology uncovers explicit reasoning paths in transformer models.

Abstract

Transformer-based language models have achieved significant success; however, their internal mechanisms remain largely opaque due to the complexity of non-linear interactions and high-dimensional operations. While previous studies have demonstrated that these models implicitly embed reasoning trees, humans typically employ various distinct logical reasoning mechanisms to complete the same task. It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks. In this paper, we aim to address this question by investigating the mechanistic interpretability of language models, particularly in the context of multi-step reasoning tasks. Specifically, we employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process, allowing us to map the reasoning paths adopted by the model. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Adam · Softmax