Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models
Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen,, Jian Xie, Rui Yan

TL;DR
This paper investigates the internal mechanisms of Transformer-based language models in factual recall, proposing a novel interpretative method and revealing a universal anti-overconfidence mechanism that can be mitigated to improve accuracy.
Contribution
It introduces a new analytic approach to decompose MLP outputs for better interpretability and uncovers a universal anti-overconfidence mechanism in models' final layers.
Findings
Attention heads extract topic tokens effectively.
MLPs amplify or erase token information.
Anti-overconfidence suppresses correct predictions.
Abstract
In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregated with equal weight and added to the residual stream, the subsequent MLP acts as an ``activation,'' which either erases or amplifies the information originating from individual heads. As a result, the topic token ``France'' stands out in the residual stream. (3) A deep MLP takes ``France'' and generates a component that redirects the residual stream towards the direction of the correct answer, i.e., ``Paris.'' This procedure is akin to applying an implicit function such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Dropout · Linear Layer · Dense Connections · Adam · Layer Normalization · OPT · Weight Decay
