Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
Jack Merullo, Carsten Eickhoff, Ellie Pavlick

TL;DR
This paper uncovers how transformer language models use low-rank subspaces for inter-layer communication, enabling understanding and manipulation of their internal information routing to improve task performance.
Contribution
It reveals the low-rank communication channels in transformers and demonstrates how analyzing and editing these can enhance model performance on specific tasks.
Findings
Models use low-rank subspaces for feature routing.
Analysis of attention heads predicts inter-layer interactions.
Manipulating internal representations improves task accuracy by over 20%.
Abstract
Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to selectively inhibit items in a context in one task, and find that it underlies a commonly used abstraction across many context-retrieval behaviors. Specifically, we find that models write into low-rank subspaces of the residual stream to represent features which are then read out by later layers, forming low-rank communication channels (Elhage et al., 2021) between layers. A particular 3D subspace in model activations in GPT-2 can be traversed to positionally index items in lists, and we show that this mechanism can explain an otherwise arbitrary-seeming sensitivity of the model to the order of items in the prompt. That is, the model has trouble copying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Dense Connections · Layer Normalization · Residual Connection · Linear Warmup With Cosine Annealing · Adam · Attention Dropout
