Dual Randomized Smoothing: Beyond Global Noise Variance
Chenhao Sun, Yuhao Mao, Martin Vechev

TL;DR
This paper introduces dual randomized smoothing, allowing input-dependent noise variances to improve neural network robustness certification at both small and large adversarial radii, overcoming the limitations of global noise variance.
Contribution
We propose a novel dual RS framework with input-dependent noise variances, validated by theoretical proof and practical training strategies, enhancing robustness certification across radii.
Findings
Outperforms prior input-dependent noise methods at multiple radii.
Achieves strong robustness performance on CIFAR-10 and ImageNet datasets.
Provides a routing perspective to improve the accuracy-robustness trade-off.
Abstract
Randomized Smoothing (RS) is a prominent technique for certifying the robustness of neural networks against adversarial perturbations. With RS, achieving high accuracy at small radii requires a small noise variance, while achieving high accuracy at large radii requires a large noise variance. However, the global noise variance used in the standard RS formulation leads to a fundamental limitation: there exists no global noise variance that simultaneously achieves strong performance at both small and large radii. To break through the global variance limitation, we propose a dual RS framework which enables input-dependent noise variances. To achieve that, we first prove that RS remains valid with input-dependent noise variances, provided the variance is locally constant around each input. Building on this result, we introduce two components: (i) a variance estimator predicts an optimal…
Peer Reviews
Decision·ICLR 2026 Poster
The paper provides a rigorous theoretical foundation by proving that RS certification remains valid with locally constant noise variances (Theorems 4.1 and 4.2). This generalizes the original RS framework and opens new possibilities for adaptive certification methods. The paper also provides a new training methodology using soft labels based on certified radius quality rather than hard labels for variance estimation. The proposed iterative training scheme, which alternates between learning the
The evaluation is restricted to CIFAR-10. The paper would benefit from experiments on larger datasets (e.g., ImageNet) and across other domains to demonstrate generalizability. The training process requires substantial computational resources (1517 GPU hours total, with 703 hours just for building the optimal variance dataset). This high cost may limit practical adoption. As the framework relies on a discrete set of candidate variances $\Sigma = \{0.25, 0.5, 1.0\}$, there should be ablation stu
The theoretical contribution is clear, rigorous, and well-motivated. Previous input-dependent RS methods (e.g., Súkeník et al., 2022; Alfarra et al., 2022) were conceptually appealing but failed to provide valid certification due to the dependence of $\sigma(x)$ on the evaluation point. Here, the authors convincingly fix this flaw by proving that local constancy of $\sigma(x)$ is sufficient for correctness. The proof is clean, self-contained, and does not rely on unreviewed external results. I f
The framework introduces double certification: one for the classifier and one for the variance estimator. While theoretically sound, this adds non-trivial complexity and sampling cost. More importantly, ensuring or certifying local constancy of $\sigma(x)$ can be difficult as the input space scales up. In high-dimensional domains such as ImageNet, verifying that $\sigma(x)$ is approximately constant in a local neighborhood is challenging, and the accuracy of the second-stage certification will h
1. The paper identifies a well-known limitation of traditional RS and provides a theoretically justified extension to input-dependent noise while maintaining certification validity. 2. The decomposition into a variance estimator and a classifier, with the option to interpret it as a routing system among expert models, is conceptually clean and practical.
1. The core idea, using input-dependent noise variance in randomized smoothing, has already been discussed in several prior works. The theoretical extension to “locally constant variance” is incremental rather than fundamentally new. 2. The paper repeatedly claims distinctions from prior approaches in different sections (sections 4 and 5), but these differences are scattered and qualitative. I would like to suggest that the authors add a clear summary table comparing key assumptions, theoretica
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
