Towards Interpreting Language Models: A Case Study in Multi-Hop   Reasoning

Mansi Sakarvadia

arXiv:2411.05037·cs.CL·November 11, 2024

Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning

Mansi Sakarvadia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a targeted memory injection method and an interpretability tool called Attention Lens to improve and analyze multi-hop reasoning in language models, addressing their reasoning failures.

Contribution

It proposes a novel memory injection technique for enhancing multi-hop reasoning and develops Attention Lens for interpreting attention heads in language models.

Findings

01

Memory injections can increase multi-hop task accuracy by up to 424%.

02

Small subsets of attention heads significantly influence model predictions.

03

Attention Lens helps identify sources of model failures and biases.

Abstract

Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Language models (LMs) struggle to perform such reasoning consistently. We propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single- and multi-hop prompts. We then propose a mechanism that allows users to inject relevant prompt-specific information, which we refer to as "memories," at critical LM locations during inference. By thus enabling the LM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We empirically show that a simple, efficient, and targeted memory injection into a key attention layer often increases the probability of the desired next token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

msakarvadia/attentionlens
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Dense Connections · Layer Normalization · Adam · Attention Dropout · Linear Layer · Weight Decay