Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options
Gracjan G\'oral, Emilia Wi\'snios, Piotr Sankowski, Pawe{\l} Budzianowski

TL;DR
This paper evaluates how large language models handle multiple-choice questions with invalid options, revealing that alignment techniques can impair their critical reasoning and refusal abilities, especially in high-stakes tasks.
Contribution
It introduces a framework for assessing LLM robustness to invalid options and uncovers how alignment methods may reduce models' reflective judgment capabilities.
Findings
Base models improve refusal with size
Alignment techniques can impair critical reasoning
Models often select invalid options under certain conditions
Abstract
This work introduces a novel framework for evaluating LLMs' capacity to balance instruction-following with critical reasoning when presented with multiple-choice questions containing no valid answers. Through systematic evaluation across arithmetic, domain-specific knowledge, and high-stakes medical decision tasks, we demonstrate that post-training aligned models often default to selecting invalid options, while base models exhibit improved refusal capabilities that scale with model size. Our analysis reveals that alignment techniques, though intended to enhance helpfulness, can inadvertently impair models' reflective judgment--the ability to override default behaviors when faced with invalid options. We additionally conduct a parallel human study showing similar instruction-following biases, with implications for how these biases may propagate through human feedback datasets used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
MethodsLLaMA · Balanced Selection
