Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Areeb Ahmad, Abhinav Joshi, Ashutosh Modi

TL;DR
This paper introduces a singular vector-based approach to interpret transformer circuits, revealing that their internal computations are distributed, structured, and composed of independent subcomponents within heads and MLPs, advancing mechanistic understanding.
Contribution
It presents a novel fine-grained interpretability method decomposing transformer components into orthogonal singular directions, uncovering overlapping subfunctions and structured computations.
Findings
Transformer components contain multiple overlapping subfunctions.
Meaningful computations are localized in low-rank subspaces.
The approach reveals distributed and compositional internal representations.
Abstract
Transformer-based language models exhibit complex and distributed behavior, yet their internal computations remain poorly understood. Existing mechanistic interpretability methods typically treat attention heads and multilayer perceptron layers (MLPs) (the building blocks of a transformer architecture) as indivisible units, overlooking possibilities of functional substructure learned within them. In this work, we introduce a more fine-grained perspective that decomposes these components into orthogonal singular directions, revealing superposed and independent computations within a single head or MLP. We validate our perspective on widely used standard tasks like Indirect Object Identification (IOI), Gender Pronoun (GP), and Greater Than (GT), showing that previously identified canonical functional heads, such as the name mover, encode multiple overlapping subfunctions aligned with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling
