Set Learning for Accurate and Calibrated Models
Lukas Muttenthaler, Robert A. Vandermeulen, Qiuyi Zhang and, Thomas Unterthiner, Klaus-Robert M\"uller

TL;DR
This paper introduces odd-$k$-out learning (OKO), a novel set-based training method that improves model accuracy and calibration by capturing data correlations, especially in limited or imbalanced datasets, without extra calibration tuning.
Contribution
The paper proposes OKO, a new set-based learning framework that enhances calibration and accuracy, with theoretical analysis and broad applicability.
Findings
OKO improves calibration even with hard labels.
OKO outperforms standard methods in limited data regimes.
No additional calibration tuning needed with OKO.
Abstract
Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd--out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We demonstrate this in extensive experimental analyses and provide a mathematical theory to interpret our findings. We emphasize that OKO is a general framework that can be easily…
Peer Reviews
Decision·ICLR 2024 poster
The paper discusses quite extensively the existing literature in the topic, deals with extremely pressing issue in the machine learning community, providing a new method to address the problem. As such it has potential of having significant impact, and the paper provides experimental evidence that their proposed method outperforms the known results in terms of calibration and accuracy on standard benchmarks for.
Given how close the newly proposed method is to label smoothing with batch balancing in essence, it might be worthwhile to discuss in much more details how the proposed method is in fact different than label smoothing, especially in main body of the paper. Highlighting a bit more the Proposition 3 which they prove in appendix could improve interpretability of their paper. On page two, in the paragraph "Empirical", the authors write "OKO is a principled approach that changes the learning objecti
The proposed loss function seems to be interesting and numerical results show good performance.
The theoretical underpinnings presented in the paper are not robustly developed. The organization of the manuscript lacks coherence, with various concepts introduced but not adequately interconnected or explicated. For further elaboration on these points, please refer to the questions outlined below.
The language is overall fine. The experimental results seem convincing.
OKO is only rapidly presented, through how instances are constructed. The overall procedure remains unclear. The properties of OKO are succinctly mentioned, but the theoretical reasons for which OKO outperforms classical classifiers are not crystal clear. The case considered in the experiments seems to be a very special case, and it is difficult to see whether the conclusions drawn can be generalized.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
