Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

Jamshid Mozafari; Bhawna Piryani; Adam Jatowt

arXiv:2605.12398·cs.CL·May 13, 2026

Question Difficulty Estimation for Large Language Models via Answer Plausibility Scoring

Jamshid Mozafari, Bhawna Piryani, Adam Jatowt

PDF

TL;DR

The paper introduces Q-DAPS, a novel method for estimating question difficulty in large language models by analyzing the entropy of answer plausibility scores, outperforming existing approaches across multiple datasets.

Contribution

Q-DAPS is a new, scalable, and robust approach that improves question difficulty estimation by leveraging answer plausibility scores, with strong empirical and human evaluation results.

Findings

01

Q-DAPS outperforms baseline methods on four QA datasets.

02

Q-DAPS remains robust across hyperparameters, question types, and model sizes.

03

Human evaluations show strong alignment with Q-DAPS's difficulty estimates.

Abstract

Estimating question difficulty is a critical component in evaluating and improving large language models (LLMs) for question answering (QA). Existing approaches often rely on readability formulas, retrieval-based signals, or popularity statistics, which may not fully capture the reasoning challenges posed to modern LLMs. In this paper, we introduce Q-DAPS (Question Difficulty based on Answer Plausibility Scores) method, a novel approach that estimates question difficulty by computing the entropy of plausibility scores over candidate answers. We systematically evaluate Q-DAPS across four prominent QA datasets-TriviaQA, NQ, MuSiQue, and QASC-demonstrating that it consistently outperforms baselines. Moreover, Q-DAPS shows strong robustness across hyperparameter variations and question types. Extensive ablation studies further show that Q-DAPS remains robust across different plausibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.