TL;DR
This paper investigates how question reformulation affects answer selection in conversational QA, revealing models' sensitivities and providing a framework to analyze robustness and errors in existing datasets.
Contribution
It introduces a question rewriting framework to analyze and evaluate the robustness of conversational QA models, highlighting their sensitivities to question formulation.
Findings
Reading comprehension models are insensitive to question phrasing.
Passage ranking models are highly sensitive to question variations.
QR helps identify and group cases where models are vulnerable.
Abstract
The dependency between an adequate question formulation and correct answer selection is a very intriguing but still underexplored area. In this paper, we show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon and also use it to evaluate robustness of different answer selection approaches. We introduce a simple framework that enables an automated analysis of the conversational question answering (QA) performance using question rewrites, and present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets. Our experiments uncover sensitivity to question formulation of the popular state-of-the-art models for reading comprehension and passage ranking. Our results demonstrate that the reading comprehension model is insensitive to question formulation, while the passage ranking changes dramatically with a little variation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
