Conformal Uncertainty Indicator for Continual Test-Time Adaptation
Fan Lyu, Hanyu Zhao, Ziqi Shi, Ye Liu, Fuyuan Hu, Zhang Zhang, Liang, Wang

TL;DR
This paper introduces a Conformal Uncertainty Indicator (CUI) for Continual Test-Time Adaptation that uses conformal prediction to estimate uncertainty, dynamically compensates for domain shifts, and improves model adaptation performance.
Contribution
The paper proposes a novel CUI method that enhances CTTA by reliably estimating uncertainty and selectively using pseudo-labels, addressing issues caused by domain shifts.
Findings
CUI accurately estimates uncertainty in CTTA.
CUI improves adaptation performance across various methods.
Dynamic coverage compensation enhances reliability of conformal prediction.
Abstract
Continual Test-Time Adaptation (CTTA) aims to adapt models to sequentially changing domains during testing, relying on pseudo-labels for self-adaptation. However, incorrect pseudo-labels can accumulate, leading to performance degradation. To address this, we propose a Conformal Uncertainty Indicator (CUI) for CTTA, leveraging Conformal Prediction (CP) to generate prediction sets that include the true label with a specified coverage probability. Since domain shifts can lower the coverage than expected, making CP unreliable, we dynamically compensate for the coverage by measuring both domain and data differences. Reliable pseudo-labels from CP are then selectively utilized to enhance adaptation. Experiments confirm that CUI effectively estimates uncertainty and improves adaptation performance across various existing CTTA methods.
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
- This paper presents the uncertainty estimator for how reliable the predicted test data is with theoretical backgrounds. - The proposed method, CUI, can be equipped with different TTA methods with reasonable amount of complexity in computation and resources.
- I’m still convinced how crucial or urgent the main problem this paper is motivated to tackle with — the error accumulation induced by updating with unreliable pseudo-labels. When comparing the OOD accuracy of the baseline that might suffer the error accumulation and those with CUI, the overall error gaps are marginal (e.g., less than 2% in most of the benchmarking cases in Table 1 and authors did not even report the error bars). When considering that the baselines do not involve any additional
* Easy to follow * Proposes an efficient yet effective solution, which can be integrated on-top of existing TTA methods.
Comparison against recent baselines: https://openreview.net/pdf?id=USWkUOfxOO * This is a method that uses a simple technique for calibration, with strong empirical performance. I wish the authors could compare their uncertainty estimation module against this. Furthermore, PseudoCal can also be used to create balanced samples for test-time adaptation WITHOUT additional source sample memory. Questions against spurious correlations Comparison against datasets of (https://arxiv.org/abs/2403.07366
1. The paper is clearly written, with a step-by-step presentation of the conformal construction and how it interfaces with TTA. 1. The paper tackles a timely and important problem: calibration-aware test-time adaptation, aiming to deliver coverage guarantees alongside accuracy. 1. The method is model-agnostic at inference and could, in principle, be layered on top of diverse TTA strategies.
Major weaknesses 1. Missing related work on calibration during adaptation, TEA (Test-time Energy Adaptation) [1], which directly adapts a model via energy minimization and reports effects on calibration. 1. Missing related work on TTA accuracy/uncertainty estimation, AETTA [2], which provides label-free accuracy estimation during TTA. Positioning against AETTA’s reliability signals is essential. 1. The paper leverages standard split-conformal machinery with a shift correction term; please cla
1. The paper is clearly written, and the workflow diagrams effectively illustrate the proposed algorithm, making it easy for readers to follow the overall methodology. 2. The proposed approach consistently improves performance across multiple experimental settings and benchmark datasets.
1. Several prior TTA methods also mitigate the effect of unreliable supervision signals by down-weighting uncertain samples during loss computation, such as EATA [1] and DeYO [2]. Unlike the proposed method, these approaches do not require access to any source-domain data. Therefore, the motivation that incorporating source data is necessary to alleviate error accumulation is not entirely convincing. 2. The paper does not include direct performance comparisons with these representative reweighti
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Health Monitoring Techniques · Fault Detection and Control Systems
