Average Certified Radius is a Poor Metric for Randomized Smoothing
Chenhao Sun, Yuhao Mao, Mark Niklas M\"uller, Martin Vechev

TL;DR
This paper critically evaluates the average certified radius (ACR) metric in randomized smoothing, demonstrating its flaws and proposing alternative evaluation strategies for robustness guarantees.
Contribution
It provides the first theoretical and empirical critique of ACR, showing its limitations and introducing alternative metrics and strategies for evaluating robustness in randomized smoothing.
Findings
ACR can be arbitrarily large for trivial classifiers
ACR is highly sensitive to easy sample improvements
Existing strategies improving ACR may reduce robustness on hard samples
Abstract
Randomized smoothing (RS) is popular for providing certified robustness guarantees against adversarial attacks. The average certified radius (ACR) has emerged as a widely used metric for tracking progress in RS. However, in this work, for the first time we show that ACR is a poor metric for evaluating robustness guarantees provided by RS. We theoretically prove not only that a trivial classifier can have arbitrarily large ACR, but also that ACR is extremely sensitive to improvements on easy samples. In addition, the comparison using ACR has a strong dependence on the certification budget. Empirically, we confirm that existing training strategies, though improving ACR, reduce the model's robustness on hard samples consistently. To strengthen our findings, we propose strategies, including explicitly discarding hard samples, reweighing the dataset with approximate certified radius, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsData Management and Algorithms · Artificial Intelligence in Games · Advanced Multi-Objective Optimization Algorithms
