ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Neeraj Varshney; Swaroop Mishra; and Chitta Baral

arXiv:2203.03073·cs.CL·March 10, 2022

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

Neeraj Varshney, Swaroop Mishra, and Chitta Baral

PDF

Open Access 1 Repo

TL;DR

ILDAE introduces a large-scale method for analyzing question difficulty in NLP evaluation data, enabling efficient evaluation, dataset improvement, model selection, and better Out-of-Domain performance estimation.

Contribution

This paper presents the first large-scale analysis of instance difficulty in NLP evaluation data, demonstrating five novel applications and providing difficulty scores for 23 datasets.

Findings

01

Using 5% of instances selected by ILDAE achieves 0.93 correlation with full dataset evaluation.

02

Difficulty scores improve Out-of-Domain performance correlation by 5.2%.

03

ILDAE's methods enable efficient evaluation and dataset refinement.

Abstract

Knowledge of questions' difficulty level helps a teacher in several ways, such as estimating students' potential quickly by asking carefully selected questions and improving quality of examination by modifying trivial and hard questions. Can we extract such benefits of instance difficulty in NLP? To this end, we conduct Instance-Level Difficulty Analysis of Evaluation data (ILDAE) in a large-scale setup of 23 datasets and demonstrate its five novel applications: 1) conducting efficient-yet-accurate evaluations with fewer instances saving computational cost and time, 2) improving quality of existing evaluation datasets by repairing erroneous and trivial instances, 3) selecting the best model based on application requirements, 4) analyzing dataset characteristics for guiding future data creation, 5) estimating Out-of-Domain performance reliably. Comprehensive experiments for these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nrjvarshney/ildae
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment · Machine Learning and Data Classification