On Mechanistic Circuits for Extractive Question-Answering
Samyadeep Basu, Vlad Morariu, Zichao Wang, Ryan Rossi, Cherry Zhao,, Soheil Feizi, Varun Manjunatha

TL;DR
This paper develops mechanistic circuits for extractive question-answering in language models, enabling better understanding, data attribution, and model steering using causal analysis and a new attribution algorithm.
Contribution
It introduces a method to extract circuits from language models, enabling reliable data attribution and model steering for extractive QA tasks.
Findings
Identified key attention heads for data attribution by default.
Developed ATTNATTRIB, a state-of-the-art attribution algorithm.
Demonstrated model steering towards context-based answers.
Abstract
Large language models are increasingly used to process documents and facilitate question-answering on them. In our paper, we extract mechanistic circuits for this real-world language modeling task: context-augmented language modeling for extractive question-answering (QA) tasks and understand the potential benefits of circuits towards downstream applications such as data attribution to context information. We extract circuits as a function of internal model components (e.g., attention heads, MLPs) using causal mediation analysis techniques. Leveraging the extracted circuits, we first understand the interplay between the model's usage of parametric memory and retrieved context towards a better mechanistic understanding of context-augmented language models. We then identify a small set of attention heads in our circuit which performs reliable data attribution by default, thereby obtaining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Seismology and Earthquake Studies
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training
