Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits

Areeb Ahmad; Abhinav Joshi; Ashutosh Modi

arXiv:2511.20273·cs.LG·November 26, 2025

Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits

Areeb Ahmad, Abhinav Joshi, Ashutosh Modi

PDF

Open Access

TL;DR

This paper introduces a singular vector-based approach to interpret transformer circuits, revealing that their internal computations are distributed, structured, and composed of independent subcomponents within heads and MLPs, advancing mechanistic understanding.

Contribution

It presents a novel fine-grained interpretability method decomposing transformer components into orthogonal singular directions, uncovering overlapping subfunctions and structured computations.

Findings

01

Transformer components contain multiple overlapping subfunctions.

02

Meaningful computations are localized in low-rank subspaces.

03

The approach reveals distributed and compositional internal representations.

Abstract

Transformer-based language models exhibit complex and distributed behavior, yet their internal computations remain poorly understood. Existing mechanistic interpretability methods typically treat attention heads and multilayer perceptron layers (MLPs) (the building blocks of a transformer architecture) as indivisible units, overlooking possibilities of functional substructure learned within them. In this work, we introduce a more fine-grained perspective that decomposes these components into orthogonal singular directions, revealing superposed and independent computations within a single head or MLP. We validate our perspective on widely used standard tasks like Indirect Object Identification (IOI), Gender Pronoun (GP), and Greater Than (GT), showing that previously identified canonical functional heads, such as the name mover, encode multiple overlapping subfunctions aligned with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling