Selective Prediction via Training Dynamics
Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, Israfil Bahceci, Akram Bin Sediq, Hamza Sokun, Nicolas Papernot

TL;DR
This paper introduces a training dynamics-based framework for selective prediction that improves accuracy and utility trade-offs without modifying model architecture or training objectives, applicable across various domains.
Contribution
The authors propose a novel, domain-agnostic method using training dynamics to enhance selective prediction performance without altering existing training procedures.
Findings
Outperforms state-of-the-art methods on image classification benchmarks.
Effective across classification, regression, and time series tasks.
Does not require changes to model architecture or training process.
Abstract
Selective Prediction is the task of rejecting inputs a model would predict incorrectly on. This involves a trade-off between input space coverage (how many data points are accepted) and model utility (how good is the performance on accepted data points). Current methods for selective prediction typically impose constraints on either the model architecture or the optimization objective; this inhibits their usage in practice and introduces unknown interactions with pre-existing loss functions. In contrast to prior work, we show that state-of-the-art selective prediction performance can be attained solely from studying the (discretized) training dynamics of a model. We propose a general framework that, given a test input, monitors metrics capturing the instability of predictions from intermediate models (i.e., checkpoints) obtained during training w.r.t. the final model's prediction. In…
Peer Reviews
Decision·Submitted to ICLR 2024
The points of strengths include: 1- The method works for several tasks including classification, regression, and time series. 2- The method seems to outperform the previous state-of-the-art selective classification methods. 3- Several experimental results presented
The points of weaknesses include: 1- The proposed idea lacks novelty as it is very similar to using ensembles of models. The difference here is that the ensembles are generated on a fixed schedule from the training dynamics. 2- Checkpoints are chosen based on a fixed schedule which can correspond to models of bad performance. A better approach is to follow the approach from [Huang et al. 2017] which constructs an ensemble by choosing points of good performance using a cyclic learning rate. Us
I find the paper well written, clearly presenting each relevant concept and experiment. The method is simple, which facilitates its adoption by ML practitioners. The experiments are convincing.
- The novelty of the method is limited, the ideas of re-using past checkpoints to form an ensemble can be found in e.g. [1] - The results for SPTD and Deep Ensemble (DE) are both relatively close to one another and it would be nice to derive conditions under which one method is expected to be better than the other. - It is unclear how the performance of SPTD is tied to optimization noise. Especially, regression experiments use full-batch gradient descent, how would the results evolve when usin
Looking at the training dynamic to gauge the prediction reliability at a test point is a refreshingly interesting idea. Despite its simple formulation, I consider the idea novel -- in fact, simplicity in implementation is a plus to me. The paper is also reasonably well-written. It is a pleasant to read this paper. All discussion points & experiment highlights are well-organized, which makes the core idea very digestible. I also appreciate the extensive results with a lot of ablation studies.
Despite the above strengths, I still have a few doubts regarding the practicality of this paper: First, the results are presented in a way that gives the impression that one can control the coverage. How is it possible in practice? I understand that the threshold can be adjusted to meet a certain coverage level on the training set but I am not sure how we could do that for the unseen test set. In other words, I feel that setting tau algorithmically should be part of the solution. Second,
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
