Comparing Classifiers: A Case Study Using PyCM

Sadra Sabouri; Alireza Zolanvari; Sepand Haghighi

arXiv:2602.13482·cs.LG·February 17, 2026

Comparing Classifiers: A Case Study Using PyCM

Sadra Sabouri, Alireza Zolanvari, Sepand Haghighi

PDF

Open Access

TL;DR

This paper demonstrates how the PyCM library can be used for detailed evaluation of multi-class classifiers, highlighting the importance of multi-dimensional metrics to uncover subtle performance differences.

Contribution

It provides a tutorial on PyCM and shows how different evaluation metrics can significantly influence model interpretation in multi-class classification.

Findings

01

Multi-dimensional evaluation reveals small performance differences.

02

Standard metrics may overlook subtle trade-offs.

03

Evaluation choice impacts model assessment significantly.

Abstract

Selecting an optimal classification model requires a robust and comprehensive understanding of the performance of the model. This paper provides a tutorial on the PyCM library, demonstrating its utility in conducting deep-dive evaluations of multi-class classifiers. By examining two different case scenarios, we illustrate how the choice of evaluation metrics can fundamentally shift the interpretation of a model's efficacy. Our findings emphasize that a multi-dimensional evaluation framework is essential for uncovering small but important differences in model performance. However, standard metrics may miss these subtle performance trade-offs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)