TL;DR
This paper introduces a unified framework for uncertainty quantification in regression tasks using kernel scores, enabling tailored measures for safety-critical applications with clear design guidelines.
Contribution
It develops a family of uncertainty measures based on proper scoring rules, unifying existing metrics and allowing customization via kernel choices for specific task needs.
Findings
Effective in downstream tasks
Demonstrates trade-offs among measures
Improves out-of-distribution detection
Abstract
Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a family of measures for total, aleatoric, and epistemic uncertainty based on proper scoring rules, with a particular emphasis on kernel scores. The framework unifies several well-known measures and provides a principled recipe for designing new ones whose behavior, such as tail sensitivity, robustness, and out-of-distribution responsiveness, is governed by the choice of kernel. We prove explicit correspondences between kernel-score characteristics and downstream behavior, yielding concrete design guidelines for task-specific measures. Extensive experiments demonstrate that these measures are effective in downstream tasks and reveal clear trade-offs among instantiations, including robustness and…
Peer Reviews
Decision·Submitted to ICLR 2026
Aleatoric, Epistemic, and total uncertainty are important topics for many applications, which require further research. I like the idea of a simple post-processing step for an already trained model to potentially improve the downstream performance. Figure 1 nicely visualizes why changing the uncertainty measure could theoretically be such a step with theoretically high potential for impact. I totally see that it was very well motivated to start this project to find out if this can actually have
**1 Major weakness: The mismatch between the overall storyline and the actual experimental results on real-world data.** The overall storyline is formulated as if (B) is very important for downstream performance (while the majority of the literature only focuses on (A)). However, I don’t see any big impact of (B) on any downstream task in this submission. Let’s go together through all the experiments: 1a Experiment of Section 6.1 QUALITATIVE ASSESSMENT OF UNCERTAINTY QUANTIFICATION I think you
* The paper tackles the important and less-explored problem of principled uncertainty quantification for regression * The proposed framework, based on kernel scores, is elegant * The theoretical connection between kernel properties (e.g., boundedness, translation invariance) and the behavior of the resulting uncertainty measures (e.g., robustness via influence functions, ordering properties) is a valuable contribution * The experiments are comprehensive and well-designed, covering quali
My main concerns are regarding the positioning of the work with respect to recent literature and some aspects of the experimental evaluation. * **Related Work:** The authors seem to have missed the highly related work of Gruber & Buettner (ICML 2024), who introduce a bias-variance-covariance decomposition of kernel scores to assess generative models. While the application domain is different (generative models vs. regression), the core idea of using kernel scores to derive uncertainty measure
1. The work presents a solid theoretical framework and addresses an interesting problem that could have intriguing implications for downstream applications. It also removes previous limitations related to uncertainty scoring in classification tasks. 2. The mathematical soundness of the paper is strong. The propositions, results, and connections to prior work are solid. 3. The authors did a good job conducting extensive experiments to evaluate the proposed functions and covering various use cases
Overall, the paper represents solid work; however, I have a few minor remarks. 1. There is a missing space on line 147 in “However,since.” 2. The begging of Part 5 is interestingly formulated. Personally, the notation on lines 232–239 introduced some confusion for me. Why not first define the convex order between any two measures and then incorporate it directly into the proposition? For example, explicitly state in Proposition 5.1 that $Q_1 \leq_{cvx}^2 Q_2$ and that $P_1 \leq_{cvx} P_2$. Or ma
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
