Calibration of Ordinal Regression Networks
Daehwan Kim, Haejun Chung, Ikbeom Jang

TL;DR
This paper introduces a novel loss function for ordinal regression networks that improves calibration and unimodality of predictions, addressing over-confidence issues inherent in traditional methods.
Contribution
It proposes an ordinal-aware calibration loss with soft ordinal encoding and regularization, achieving state-of-the-art calibration in ordinal regression tasks.
Findings
Achieves state-of-the-art calibration across benchmarks.
Maintains high classification accuracy.
Addresses over-confidence and unimodality issues.
Abstract
Recent studies have shown that deep neural networks are not well-calibrated and often produce over-confident predictions. The miscalibration issue primarily stems from using cross-entropy in classifications, which aims to align predicted softmax probabilities with one-hot labels. In ordinal regression tasks, this problem is compounded by an additional challenge: the expectation that softmax probabilities should exhibit unimodal distribution is not met with cross-entropy. The ordinal regression literature has focused on learning orders and overlooked calibration. To address both issues, we propose a novel loss function that introduces ordinal-aware calibration, ensuring that prediction confidence adheres to ordinal relationships between classes. It incorporates soft ordinal encoding and ordinal-aware regularization to enforce both calibration and unimodality. Extensive experiments across…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
1. a novel regularization term is used to promote unimodularity. 2. Paper is well written and easy to read.
1. Theoretical proofs of calibration and unimodularity are missing.
1. The problem of calibration in the context of ordinal regression sounds novel and important. As far as I know, this work should be the first work to solve this issue. 2. The improvement is significant empirically. From Table 2, we can observe a great improvement in the calibration of ordinal regression models and the classification accuracy is preserved.
1. The L_{REG} defined in Equation 2 is not clearly explained. In particular, the design of I(r) is hard to understand for readers. It would be better if the authors could elaborate on how the regularization is constructed. 2. The writing of the gradient analysis in Subsection 3.4 is not clear. The authors may need to improve the writing in this part, or it might be too challenging for readers to follow. 3. The technical novelty of the proposed method is not presented. While the authors claim
• The motivation behind the new method is well-articulated, clearly highlighting the limitations of traditional cross-entropy (CE) loss in ordinal tasks and the miscalibration in modern ordinal regression models. • The proposed method is straightforward, and easy to implement.
1. It is unclear how the regularization component of ORCU promotes calibration. Lines 240-252 discuss scenarios where the model is under or overconfident, yet this confidence is based on a soft-encoded distribution not directly related to the data, which raises questions about its reflection of "real" confidence. Additionally, I would appreciate a more rigorous explanation of how this regularization approach aligns with the standard mathematical definition of calibration. Could the authors provi
- This work addresses an important overconfidence issue in ordinal regression tasks - The proposed loss function is assumed to address both accuracy and confidence of the cross entropy loss based model during optimization without additional post-training calibration - The authors justify the unimodality enforcement of the proposed loss by gradient analysis
**Major**: - The main focus of the work is CE-loss based ordinal regression which is not an optimal loss for this task and several methods were proposed without CE loss: [1-4] - The motivation in Sec 3.1 is unclear, how the calibration is defined and why it is not implied by CE-loss. The discussion seems to be valid for the ordered nature of classes but not for calibration. It is better to discuss the motivation for each problem separately. - $\mathcal{L}_{SCE}$ - the explanation in L175-177
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and Computational Modeling · Neural Networks and Applications · Advanced Statistical Methods and Models
MethodsALIGN · Softmax
