Learning Better Certified Models from Empirically-Robust Teachers
Alessandro De Palma

TL;DR
This paper introduces a method to enhance certifiably robust neural networks by distilling knowledge from empirically-robust teachers, achieving better certified robustness without sacrificing standard accuracy.
Contribution
It proposes a novel feature-space distillation approach from adversarially-trained teachers to improve certified robustness in neural networks.
Findings
Distillation from robust teachers improves certified robustness.
Achieves state-of-the-art results on robust benchmarks.
Enhances the trade-off between robustness and accuracy.
Abstract
Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amenable to strong robustness certificates through neural network verification. On the other hand, earlier certified training schemes directly train on bounds from network relaxations to obtain models that are certifiably robust, but display sub-par standard performance. Recent work has shown that state-of-the-art trade-offs between certified robustness and standard performance can be obtained through a family of losses combining adversarial outputs and neural network bounds. Nevertheless, differently from empirical robustness, verifiability still comes at a significant cost in standard performance. In this work, we propose to leverage empirically-robust teachers to improve the performance of…
Peer Reviews
Decision·Submitted to ICLR 2026
++ I generally like the idea: although adversarial training will not generally obtain provably robust models, these may not be problem of adversarial training but the current verifier cannot certify their robustness. By incorporating the adversarially trained models into provable robust learning in a smooth way can make the best of both sides. ++ The algorithm is straightforward and the manuscript is well written: compared with CC-IBP, it only adds one single hyper-parameter which is not very s
1. The gaps between the proposed method and baselines are relatively small on Table 1, so running the experiments for multiple times and report the performance variance would be better. 2. PGD-40 is utilised as the metric for empirical robustness. However, the most reliable empirical robustness evaluation scheme is AutoAttack in RobustBench. It would be better to use AutoAttack instead. 3. The scope is a bit limited, as the results and discussions are demonstrated for $l_\infty$ bounded pertur
1. This work achieves a tight theoretical coupling between feature-space distillation and expressive certifiable training (CC-IBP). The distillation target is constructed as a convex combination of adversarial features and their IBP lower and upper bounds, and the authors prove that the distillation term upper-bounds the worst-case feature-level risk while varying continuously and monotonically with the interpolation parameter α. Under the affine classifier assumption, α jointly regulates both t
1. In scenarios with large perturbation radii, such as ε = 8/255 on CIFAR-10, the ReLU-based CC-Dist has yet to surpass specialized 1-Lipschitz architectures such as SortNet. This suggests that, although the approach advances the accuracy–certificate frontier in most settings, it has not fundamentally bridged the structural gap in large-ε regimes. Moreover, both theoretical and experimental analyses mainly focus on ReLU activations and IBP, while the adaptation and potential benefits across othe
1. Section 2 is a nice description of the background 2. The experiment section includes a number of experiments
I think there are three main weaknesses: presentation, motivation, and experiments. ## 1. Presentation The presentation could be greatly improved. Even though I am pretty familiar with the subject, I feel overwhelmed by the notations in this manuscript that look very similar to one another. I cannot see why it is necessary to add CC as a subscript before everything, and those $\theta_h$, $\theta_h^t$ etc. are all very confusing. I think the idea is pretty simple. Basically there are two ways t
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
