Estimating problem difficulty without ground truth using Large Language Model comparisons
Marthe Ballon, Andres Algaba, Brecht Verbeken, Vincent Ginis

TL;DR
This paper introduces LLM compare, a novel, scalable method for estimating problem difficulty without ground truth, using pairwise comparisons and Bradley-Terry scores, effective even on out-of-distribution problems.
Contribution
The paper presents LLM compare, a ground truth-independent, model-agnostic difficulty estimation method that aligns well with human judgments and is robust to hallucinations.
Findings
High correlation with human annotations (Pearson r ≥ 0.80)
Robust to hallucinations with less than 6% degradation
Addresses out-of-distribution problem difficulty estimation
Abstract
Recent advances in the finetuning of large language models (LLMs) have significantly improved their performance on established benchmarks, emphasizing the need for increasingly difficult, synthetic data. A key step in this data generation pipeline is a method for estimating problem difficulty. Current approaches, such as human calibration or performance-based scoring, fail to generalize to out-of-distribution problems, i.e. problems currently unsolvable by humans and LLMs, because they are not scalable, time-consuming, and ground truth dependent. Therefore, we propose a new method for estimating problem difficulty, LLM compare, that addresses these limitations. An LLM performs pairwise difficulty comparisons, and then Bradley-Terry scores are computed based on the outcomes. To validate our method, we first propose a conceptual framework that positions existing approaches on three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Intelligent Tutoring Systems and Adaptive Learning
