TL;DR
This paper introduces AdaConG, a method that adaptively adjusts the influence of guidance signals in machine learning based on their uncertainty, improving robustness and performance across various tasks.
Contribution
AdaConG is a novel approach that dynamically modulates guidance influence using conformal prediction, enhancing learning under noisy or uncertain guidance signals.
Findings
Improves learning robustness in noisy guidance scenarios.
Accelerates convergence and increases rewards in gridworld navigation.
Demonstrates broad applicability across diverse tasks.
Abstract
Learning with guidance has proven effective across a wide range of machine learning systems. Guidance may, for example, come from annotated datasets in supervised learning, pseudo-labels in semi-supervised learning, and expert demonstration policies in reinforcement learning. However, guidance signals can be noisy due to domain shifts and limited data availability and may not generalize well. Blindly trusting such signals when they are noisy, incomplete, or misaligned with the target domain can lead to degraded performance. To address these challenges, we propose Adaptive Conformal Guidance (AdaConG), a simple yet effective approach that dynamically modulates the influence of guidance signals based on their associated uncertainty, quantified via split conformal prediction (CP). By adaptively adjusting to guidance uncertainty, AdaConG enables models to reduce reliance on potentially…
Peer Reviews
Decision·ICLR 2026 Poster
(1) The idea is elegant, simple and scales well across SSL, KD and imitation RL. (2) Experiments are rigorous and covers a strong empirical breadth. (3) It seems to be very lightweight and model agnostic compared to MC-dropout-style approaches. (4) The paper is well written and articulated.
Weaknesses and questions: (1) From my understanding, w(s) weighs a KL guidance loss; in Experiments, however, looks like w(s) chooses actions (stochastic arbitration) and a hard argmax variant. These are different algorithms! Please explicitly state it (if it was not a mistake). (2) \gamma is being overloaded ((i) RL discount, (ii) temperature in h(u), EMA smoothing factor all use \gamma). (3) Similarly s is being used for score and state. (4) Coverage guarantees require exchangeability, s
1. The paper is well written and easy to follow. 2. The idea of the paper is straightforward. 3. The experiments on diverse tasks demonstrate the effectiveness of the proposed method.
1. As discussed in the abstract and introduction, the learning-with-guidance methods are easy to be influenced by noisy guidance lead by domain shifts or limited data, where considering the uncertainty is important. However, in the experiments, the datasets and cases are used for evaluation seems more simple, without considering distribution shifts or limited data. I suggest to add some experiments on these difficult cases, such as datasets like CIFAR-C, ImageNet-C. 2. The method is also evalua
1. The experimental design is comprehensive and rigorous, effectively validating the proposed method's robustness to misleading signals. 2. All experimental results report mean and standard deviation over multiple runs, demonstrating scientific rigor and reproducibility. 3. The core idea of improving learning quality by modulating the uncertainty of guidance signals is inspiring, with valuable practical implications across a wide range of tasks. 4. Incorporating conformal prediction to inform
1. I found some of the tables challenging to parse at a first glance (e.g., Table 1 and 4). Reformatting them to be more self-explanatory would enhance the paper's readability. 2. I only found efficiency analysis for KD task. I would recommend the authors also include (training time, computation overhead) comparison in other tasks with key baselines.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
