Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice   Questions

Sarah Wiegreffe; Oyvind Tafjord; Yonatan Belinkov; Hannaneh; Hajishirzi; Ashish Sabharwal

arXiv:2407.15018·cs.CL·March 11, 2025·1 cites

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Sarah Wiegreffe, Oyvind Tafjord, Yonatan Belinkov, Hannaneh, Hajishirzi, Ashish Sabharwal

PDF

Open Access

TL;DR

This paper investigates how transformer language models perform on multiple-choice question answering by analyzing internal mechanisms, identifying key layers and attention heads responsible for answer prediction, and proposing a synthetic task to diagnose model learning.

Contribution

It introduces methods to localize and understand the internal decision processes of models on MCQA tasks, revealing the roles of specific layers and attention heads, and presents a synthetic diagnostic task for error analysis.

Findings

01

Prediction is causally linked to middle layers' attention mechanisms.

02

Answer probability increases in later layers, driven by specific attention heads.

03

Models' ability to distinguish answer choices improves over training.

Abstract

Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have quite a range of performance, particularly when the task format is diversified slightly (such as by shuffling answer choice order). In this work we ask: how do successful models perform formatted MCQA? We employ vocabulary projection and activation patching methods to localize key hidden states that encode relevant information for predicting the correct answer. We find that the prediction of a specific answer symbol is causally attributed to a few middle layers, and specifically their multi-head self-attention mechanisms. We show that subsequent layers increase the probability of the predicted answer symbol in vocabulary space, and that this probability increase is associated with a sparse set…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research · Language, Metaphor, and Cognition · Discourse Analysis in Language Studies

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Activation Patching