ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
Neeraj Varshney, Swaroop Mishra, and Chitta Baral

TL;DR
ILDAE introduces a large-scale method for analyzing question difficulty in NLP evaluation data, enabling efficient evaluation, dataset improvement, model selection, and better Out-of-Domain performance estimation.
Contribution
This paper presents the first large-scale analysis of instance difficulty in NLP evaluation data, demonstrating five novel applications and providing difficulty scores for 23 datasets.
Findings
Using 5% of instances selected by ILDAE achieves 0.93 correlation with full dataset evaluation.
Difficulty scores improve Out-of-Domain performance correlation by 5.2%.
ILDAE's methods enable efficient evaluation and dataset refinement.
Abstract
Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in NLP? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment · Machine Learning and Data Classification
