Towards Certified Probabilistic Robustness with High Accuracy
Ruihan Zhang, Peixin Zhang, Jun Sun

TL;DR
This paper introduces a novel training and inference approach that achieves high accuracy and certified probabilistic robustness in neural networks, effectively balancing robustness and performance.
Contribution
It proposes a new probabilistic robust training method combined with an efficient runtime certification technique, addressing the trade-off between accuracy and certified robustness.
Findings
Outperforms existing methods in certification rate
Maintains high accuracy while providing probabilistic robustness
Works efficiently across various models and datasets
Abstract
Adversarial examples pose a security threat to many critical systems built on neural networks (such as face recognition systems, and self-driving cars). While many methods have been proposed to build robust models, how to build certifiably robust yet accurate neural network models remains an open problem. For example, adversarial training improves empirical robustness, but they do not provide certification of the model's robustness. On the other hand, certified training provides certified robustness but at the cost of a significant accuracy drop. In this work, we propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness. Our method has two parts, i.e., a probabilistic robust training method with an additional goal of minimizing variance in terms of divergence and a runtime inference method for certified probabilistic robustness of the…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
* Training for deterministic robustness can cause over-regularization and yield practically useless models. Randomized smoothing offers guarantees while reducing the loss of accuracy but the resulting model has a large inference overhead. The submission aims to overcome these important limitations of prior work by achieving less inference overhead, probabilistic guarantees, and minimizing accuracy loss. * The particular combination of the training, certification, and inference algorithms for
* The comparison in Table 1 using the proposed robustness metric is misleading. The guarantees provided by deterministic methods are fundamentally different from probabilistic ones. The probabilistic guarantees for local robustness with $\kappa=0.01$ are quite weak for high-dimensional inputs (going from $\kappa=0.01$ to $0.0001$ can be hard). If the authors insist on making such a comparison, then I would suggest doing the following: (i) Count the total number of images within each adversarial
- The objective of minimizing a combination of expectation and variance of loss over the neighborhood of a point is a sensible (and to my knowledge novel) approach to achieving probabilistic robustness. - The empirical evaluations seem strong, with state-of-the-art certified probabilistic robustness. The results on adversarial robustness (Table 3) are particularly impressive, appearing to significantly outperform even methods tailored specifically to achieve adversarial (rather than probabilisti
- The novelty of the paper, aside from the variance training objective, is unclear. In particular, the suggested inference method (selecting the majority decision among sampled points) seems to just be randomized smoothing. Further, it is unclear how the certification approach compares to similar methods (Baluta et al. 2021, Zhang et al. 2023). I would suggest that the authors add an explicit related work section to clarify these points (as well as contextualize the work more thoroughly). - The
The authors of this paper have clearly placed a significant amount of effort into ensuring they comprehensively surveyed the state of the literature for certification mechanisms (although I did find it a little odd that foundational references like Lecuyer et. al. on DP for RS were missing).
Broadly my concerns relate to - the nature of the experimental comparisons constructed (and some scepticism about the associated results); non-standard evaluation measures for certification (not looking at the relationship between certification sizes and accuracy); and a writing style that makes the nature of the contributions difficult to parse (and a framing that is potentially disguising the similarities between the developed works and prior techniques - see the point about P > 0.5 below).
There are a few main concerns with the paper (see Questions below) that need to be addressed. They may be based on a misunderstanding of the approach, and so appropriate answers from the authors would lead me to raise my score.
My main concerns are raised in the Questions section. But some minor details: - in section 2 it was stated initially that h outputed a label. And yet later on in the same paragraph this was changed to logits. Precision/consistency is required - writing G_x = arg max h(x) for example needs to make it very clear what the argument is they're searching over (vector index of logit output here) - It is claimed in the introduction that randomised smoothing suffers from "significant accuracy loss". Thi
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
