Who is the richest club in the championship? Detecting and Rewriting Underspecified Questions Improve QA Performance
Yunchong Huang, Gianni Barlacchi, Sandro Pezzelle

TL;DR
This paper investigates how underspecified questions affect QA performance, introduces a classifier to detect them, and shows that rewriting questions to be fully specified improves accuracy, highlighting a key evaluation confound.
Contribution
It presents an LLM-based method to identify underspecified questions and demonstrates that rewriting them enhances QA results, emphasizing the importance of question clarity.
Findings
16% to over 50% of benchmark questions are underspecified.
QA performance improves when questions are rewritten to be fully specified.
Underspecification significantly impacts QA evaluation outcomes.
Abstract
Large language models (LLMs) perform well on well-posed questions, yet standard question-answering (QA) benchmarks remain far from solved. We argue that this gap is partly due to underspecified questions - queries whose interpretation cannot be uniquely determined without additional context. To test this hypothesis, we introduce an LLM-based classifier to identify underspecified questions and apply it to several widely used QA datasets, finding that 16% to over 50% of benchmark questions are underspecified and that LLMs perform significantly worse on them. To isolate the effect of underspecification, we conduct a controlled rewriting experiment that serves as an upper-bound analysis, rewriting underspecified questions into fully specified variants while holding gold answers fixed. QA performance consistently improves under this setting, indicating that many apparent QA failures stem…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
