Deep Neural Network Benchmarks for Selective Classification
Andrea Pugnana, Lorenzo Perini, Jesse Davis, Salvatore, Ruggieri

TL;DR
This paper benchmarks 18 selective classification methods based on deep neural networks across 44 diverse datasets, providing comprehensive insights into their relative performance and suitability for different objectives in trustworthy AI deployment.
Contribution
It offers the first extensive empirical comparison of multiple deep learning-based selective classification approaches on a large, diverse dataset collection.
Findings
No single method outperforms others across all metrics.
Performance varies depending on the specific evaluation criterion and dataset.
Practitioners should choose methods based on their specific accuracy and coverage needs.
Abstract
With the increasing deployment of machine learning models in many socially sensitive tasks, there is a growing demand for reliable and trustworthy predictions. One way to accomplish these requirements is to allow a model to abstain from making a prediction when there is a high risk of making an error. This requires adding a selection mechanism to the model, which selects those examples for which the model will provide a prediction. The selective classification framework aims to design a mechanism that balances the fraction of rejected predictions (i.e., the proportion of examples for which the model does not make a prediction) versus the improvement in predictive performance on the selected predictions. Multiple selective classification frameworks exist, most of which rely on deep neural network architectures. However, the empirical evaluation of the existing approaches is still limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training
