Standing on the shoulders of giants

Lucas Felipe Ferraro Cardoso; Jos\'e de Sousa Ribeiro Filho; Vitor; Cirilo Araujo Santos; Regiane Silva Kawasaki Frances; Ronnie Cley de; Oliveira Alves

arXiv:2409.03151·cs.LG·September 9, 2024

Standing on the shoulders of giants

Lucas Felipe Ferraro Cardoso, Jos\'e de Sousa Ribeiro Filho, Vitor, Cirilo Araujo Santos, Regiane Silva Kawasaki Frances, Ronnie Cley de, Oliveira Alves

PDF

Open Access 1 Repo

TL;DR

This paper explores how Item Response Theory (IRT) can complement traditional confusion matrix metrics to better evaluate machine learning models by analyzing their performance at the instance level and revealing nuanced differences.

Contribution

It introduces a method to integrate IRT into model evaluation, providing a new layer of insight beyond classical metrics.

Findings

01

IRT complements classical metrics by revealing fine-grained model behaviors.

02

There is a 97% confidence that IRT scores differ from most classical metrics.

03

IRT offers a new perspective for selecting models with similar overall performance.

Abstract

Although fundamental to the advancement of Machine Learning, the classic evaluation metrics extracted from the confusion matrix, such as precision and F1, are limited. Such metrics only offer a quantitative view of the models' performance, without considering the complexity of the data or the quality of the hit. To overcome these limitations, recent research has introduced the use of psychometric metrics such as Item Response Theory (IRT), which allows an assessment at the level of latent characteristics of instances. This work investigates how IRT concepts can enrich a confusion matrix in order to identify which model is the most appropriate among options with similar performance. In the study carried out, IRT does not replace, but complements classical metrics by offering a new layer of evaluation and observation of the fine behavior of models in specific instances. It was also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

LucasFerraroCardoso/IRT_Confusion_Matrix
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPsychometric Methodologies and Testing · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)