Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA   Dataset with Answer Plausibility Scores

Jamshid Mozafari; Abdelrahman Abdallah; Bhawna Piryani; Adam Jatowt

arXiv:2502.16358·cs.CL·April 22, 2025

Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores

Jamshid Mozafari, Abdelrahman Abdallah, Bhawna Piryani, Adam Jatowt

PDF

1 Repo 1 Datasets

TL;DR

PlausibleQA is a large-scale dataset with answer plausibility scores, designed to improve nuanced evaluation of LLMs by including plausible yet incorrect answers and their justifications.

Contribution

The paper introduces PlausibleQA, a novel dataset with plausibility annotations for answers, enabling more detailed assessment of LLMs beyond correctness.

Findings

01

PlausibleQA improves evaluation of distractor generation in MCQA.

02

PlausibleQA enhances robustness assessment in QARA.

03

Human assessments confirm dataset's utility.

Abstract

Large Language Models (LLMs) are revolutionizing information retrieval, with chatbots becoming an important source for answering user queries. As by their design, LLMs prioritize generating correct answers, the value of highly plausible yet incorrect answers (candidate answers) tends to be overlooked. However, such answers can still prove useful, for example, they can play a crucial role in tasks like Multiple-Choice Question Answering (MCQA) and QA Robustness Assessment (QARA). Existing QA datasets primarily focus on correct answers without explicit consideration of the plausibility of other candidate answers, limiting opportunity for more nuanced evaluations of models. To address this gap, we introduce PlausibleQA, a large-scale dataset comprising 10,000 questions and 100,000 candidate answers, each annotated with plausibility scores and justifications for their selection.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DataScienceUIBK/PlausibleQA
noneOfficial

Datasets

JamshidJDMY/PlausibleQA
dataset· 30 dl
30 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus