Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning
Mansi Sakarvadia

TL;DR
This paper introduces a targeted memory injection method and an interpretability tool called Attention Lens to improve and analyze multi-hop reasoning in language models, addressing their reasoning failures.
Contribution
It proposes a novel memory injection technique for enhancing multi-hop reasoning and develops Attention Lens for interpreting attention heads in language models.
Findings
Memory injections can increase multi-hop task accuracy by up to 424%.
Small subsets of attention heads significantly influence model predictions.
Attention Lens helps identify sources of model failures and biases.
Abstract
Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Language models (LMs) struggle to perform such reasoning consistently. We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single- and multi-hop prompts. We then propose a mechanism that allows users to inject relevant prompt-specific information, which we refer to as "memories," at critical LM locations during inference. By thus enabling the LM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We empirically show that a simple, efficient, and targeted memory injection into a key attention layer often increases the probability of the desired next token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization · Adam · Attention Dropout · Linear Layer · Weight Decay
