Softmax is not Enough (for Adaptive Conformal Classification)

Navid Akhavan Attar; Hesam Asadollahzadeh; Ling Luo; Uwe Aickelin

arXiv:2602.19498·cs.LG·February 24, 2026

Softmax is not Enough (for Adaptive Conformal Classification)

Navid Akhavan Attar, Hesam Asadollahzadeh, Ling Luo, Uwe Aickelin

PDF

Open Access 3 Reviews

TL;DR

This paper introduces an energy-based method using Helmholtz Free Energy to improve the adaptiveness and efficiency of conformal prediction sets in deep classifiers, addressing softmax unreliability.

Contribution

It proposes a novel approach that reweights nonconformity scores with energy scores from the pre-softmax logits, enhancing uncertainty estimation in conformal classification.

Findings

01

Improved adaptiveness of prediction sets across multiple datasets.

02

Enhanced efficiency without added post-hoc complexity.

03

Consistent performance gains with state-of-the-art score functions.

Abstract

The merit of Conformal Prediction (CP), as a distribution-free framework for uncertainty quantification, depends on generating prediction sets that are efficient, reflected in small average set sizes, while adaptive, meaning they signal uncertainty by varying in size according to input difficulty. A central limitation for deep conformal classifiers is that the nonconformity scores are derived from softmax outputs, which can be unreliable indicators of how certain the model truly is about a given input, sometimes leading to overconfident misclassifications or undue hesitation. In this work, we argue that this unreliability can be inherited by the prediction sets generated by CP, limiting their capacity for adaptiveness. We propose a new approach that leverages information from the pre-softmax logit space, using the Helmholtz Free Energy as a measure of model uncertainty and sample…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 3

Strengths

1. Addresses a timely and relevant issue in conformal prediction: softmax overconfidence and lack of adaptiveness. 2. The proposed energy-based transformation is simple, intuitive, and easy to integrate into existing CP pipelines. 3. Empirical results are consistent across several datasets and show modest but clear improvements in prediction-set efficiency.

Weaknesses

1. The logical argument connecting “softmax overconfidence” to “inefficient conformal prediction” is not rigorously analyzed; the paper relies mainly on anecdotal intuition rather than formal reasoning. 2. The claim that “monotonic transformations preserve validity” is standard, and no formal analysis of efficiency or adaptive coverage is provided. 3. Experiments only report marginal coverage and average set size, lacking conditional coverage evaluation, significance tests, and ablations on the

Reviewer 02Rating 6Confidence 3

Strengths

1. Important and Well-Defined Problem: The paper addresses a recognized, significant problem-the deficiency of softmax probabilities for uncertainty quantification and how this directly harms the adaptiveness of Conformal Prediction. The motivation is clear and compelling. 2. Novel and Principled Solution: Using Helmholtz Free Energy (derived from logits, not softmax) as an uncertainty metric is a novel and theoretically grounded (via EBM framework) idea . This is more robust than relying on heu

Weaknesses

The proposed score design introduces two hyperparameters, $\tau$ and $\beta$. The authors should clarify how these parameters are selected in practice and provide justification for the chosen values.

Reviewer 03Rating 6Confidence 3

Strengths

- Reweighting conformity scores is a simple and powerful technique to improve CP adaptivity. The possibility of integrating the proposed strategy with existing approaches makes it potentially relevant and easy to use in many practical scenarios. - The proposed free energy seems to be a good proxy of an instance's difficulty when labels are not available.

Weaknesses

- Reweighting approaches are not new and are mostly used for regression. In the classification setup, [1] proposes a reweighting approach based on the entropy of the class probabilities. The authors may comment on the difference between their approach and that technique. - As the goal is to show adaptivity, the authors should report some approximate measure of models' conditional coverage. - The authors may give an intuitive explanation of why *model-implied data density* is a good measure of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning