On Evaluation of Vision Datasets and Models using Human Competency   Frameworks

Rahul Ramachandran; Tejal Kulkarni; Charchit Sharma; Deepak; Vijaykeerthy; Vineeth N Balasubramanian

arXiv:2409.04041·cs.CV·September 9, 2024

On Evaluation of Vision Datasets and Models using Human Competency Frameworks

Rahul Ramachandran, Tejal Kulkarni, Charchit Sharma, Deepak, Vijaykeerthy, Vineeth N Balasubramanian

PDF

Open Access

TL;DR

This paper introduces the use of Item Response Theory (IRT) to evaluate computer vision datasets and models, providing a richer analysis than traditional accuracy metrics by inferring interpretable latent parameters.

Contribution

It applies IRT to computer vision evaluation, enabling detailed analysis of models and datasets beyond simple accuracy scores.

Findings

01

IRT reveals model calibration differences.

02

IRT identifies informative data subsets.

03

Latent parameters aid in model and dataset comparison.

Abstract

Evaluating models and datasets in computer vision remains a challenging task, with most leaderboards relying solely on accuracy. While accuracy is a popular metric for model evaluation, it provides only a coarse assessment by considering a single model's score on all dataset items. This paper explores Item Response Theory (IRT), a framework that infers interpretable latent parameters for an ensemble of models and each dataset item, enabling richer evaluation and analysis beyond the single accuracy number. Leveraging IRT, we assess model calibration, select informative data subsets, and demonstrate the usefulness of its latent parameters for analyzing and comparing models and datasets in computer vision.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems · Graph Theory and Algorithms · Spatial Cognition and Navigation