Interpreting Key Mechanisms of Factual Recall in Transformer-Based   Language Models

Ang Lv; Yuhan Chen; Kaiyi Zhang; Yulong Wang; Lifeng Liu; Ji-Rong Wen,; Jian Xie; Rui Yan

arXiv:2403.19521·cs.CL·May 27, 2024·2 cites

Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen,, Jian Xie, Rui Yan

PDF

Open Access 1 Repo

TL;DR

This paper investigates the internal mechanisms of Transformer-based language models in factual recall, proposing a novel interpretative method and revealing a universal anti-overconfidence mechanism that can be mitigated to improve accuracy.

Contribution

It introduces a new analytic approach to decompose MLP outputs for better interpretability and uncovers a universal anti-overconfidence mechanism in models' final layers.

Findings

01

Attention heads extract topic tokens effectively.

02

MLPs amplify or erase token information.

03

Anti-overconfidence suppresses correct predictions.

Abstract

In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregated with equal weight and added to the residual stream, the subsequent MLP acts as an ``activation,'' which either erases or amplifies the information originating from individual heads. As a result, the topic token ``France'' stands out in the residual stream. (3) A deep MLP takes ``France'' and generates a component that redirects the residual stream towards the direction of the correct answer, i.e., ``Paris.'' This procedure is akin to applying an implicit function such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trestad/factual-recall-mechanism
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Dropout · Linear Layer · Dense Connections · Adam · Layer Normalization · OPT · Weight Decay