Anchored Answers: Unravelling Positional Bias in GPT-2's Multiple-Choice Questions
Ruizhe Li, Yanjun Gao

TL;DR
This paper investigates the positional bias in GPT-2 models for multiple-choice questions, identifies internal mechanisms responsible, and proposes targeted interventions to mitigate bias and improve accuracy.
Contribution
It provides the first mechanistic analysis of anchored bias in GPT-2's MCQ performance and introduces minimal interventions to reduce bias and enhance robustness.
Findings
Mitigating bias improves GPT-2 accuracy on MCQs
Internal modules responsible for bias identified and modified
Targeted interventions significantly reduce anchored bias
Abstract
Large Language Models (LLMs), such as the GPT-4 and LLaMA families, have demonstrated considerable success across diverse tasks, including multiple-choice questions (MCQs). However, these models exhibit a positional bias, particularly an even worse anchored bias in the GPT-2 family, where they consistently favour the first choice 'A' in MCQs during inference. This anchored bias challenges the integrity of GPT-2's decision-making process, as it skews performance based on the position rather than the content of the choices in MCQs. In this study, we utilise the mechanistic interpretability approach to identify the internal modules within GPT-2 models responsible for this bias. We focus on the Multi-Layer Perceptron (MLP) layers and attention heads, using the "logit lens" method to trace and modify the specific value vectors that contribute to the bias. By updating these vectors within MLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsClinical Reasoning and Diagnostic Skills · Decision-Making and Behavioral Economics · Meta-analysis and systematic reviews
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Cosine Annealing · Dense Connections · Transformer · Dropout · Weight Decay
