SPLITZ: Certifiable Robustness via Split Lipschitz Randomized Smoothing
Meiyu Zhong, Ravi Tandon

TL;DR
SPLITZ introduces a novel framework combining Lipschitz constraints and randomized smoothing to enhance certifiable robustness of classifiers against adversarial perturbations, demonstrating improved accuracy on multiple datasets.
Contribution
It proposes SPLITZ, a new method that splits classifiers into two parts, constrains the first, and smooths the second, leveraging heterogeneity in deep networks for better robustness guarantees.
Findings
SPLITZ outperforms existing methods on MNIST, CIFAR-10, and ImageNet.
Achieves 43.2% top-1 accuracy on CIFAR-10 with $\, ext{l}_2$ perturbation $\, ext{}\epsilon=1.
Provides theoretical robustness guarantees during inference.
Abstract
Certifiable robustness gives the guarantee that small perturbations around an input to a classifier will not change the prediction. There are two approaches to provide certifiable robustness to adversarial examples: a) explicitly training classifiers with small Lipschitz constants, and b) Randomized smoothing, which adds random noise to the input to create a smooth classifier. We propose SPLITZ, a practical and novel approach which leverages the synergistic benefits of both the above ideas into a single framework. Our main idea is to split a classifier into two halves, constrain the Lipschitz constant of the first half, and smooth the second half via randomization. Motivation for SPLITZ comes from the observation that many standard deep networks exhibit heterogeneity in Lipschitz constants across layers. SPLITZ can exploit this heterogeneity while inheriting the scalability of…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
1. The proposed method is more scalable than RS. 2. The idea of exploiting heterogeneity in Lipschitz constants across layers is interesting. 3. Numerical study is quite comprehensive.
1. The idea of combining Lipschitz networks and RS may not be that original. In general, Lipschitz training is not the only way to restrict the Lipschitz constant of networks. One can enforce various network structures so a prescribed Lipschitz constant is ensured. The following paper which appeared in 2021 combines orthogonal Lipschitz layers with RS: Huimin Zeng, Jiahao Su, and Furong Huang. Certified defense via latent space randomized smoothing with orthogonal encoders. arXiv2021. Therefor
1. The ideas in the paper are articulated with clarity and are easy to follow. 2. The paper proposes a novel technique that combines two established methods with considerable efficacy. 3. Along with theoretical robustness guarantees, it proposes a training procedure to optimize the robustness criteria (Lipschitzness for the first part and robustness to random noise for the second) needed for this method. 4. It outperforms state-of-the-art techniques of certified robustness for MNIST and CIFAR-10
1. The method does not consistently outperform existing approaches for ImageNet. Specifically, it does not perform as well as DDS, which leverages additional data to improve robustness. Despite this, the improvement for smaller datasets is noteworthy. Keeping this in mind, I am leaning toward accepting this paper.
The idea behind the proposed approach is novel and conceptually simple/intuitive: to the best of my knowledge, this is the first work combining Lipschitz-based certified training schemes with randomized smoothing. The paper is mostly well-written, with a clear presentation of the required technical background (sometimes in the appendix) and of the main technical building blocks of SPLITZ. What stands out the most, though, is the experimental section, showing that SPLITZ outperforms previous app
To my mind, the main weakness of the work lies in the introduction of a fair number of hyper-parameters (train-time $\gamma$, $\theta$, $\lambda$), which will inevitably increase the overall runtime overhead of the proposed approach. Analogously, it would be nice to see a detailed analysis of the overhead incurred by the optimization over $\gamma$ (remark 1). In addition, I think that the presentation itself could be somewhat improved in a couple of instances. For instance, the authors repeated
This idea is interesting, and given that Lipschitz continuity and randomized smoothing are two well-established methods for certifiable robustness, it is interesting to work toward a unification of the two approaches.
**Major weakness: I believe the paper to be flawed** - The results on CIFAR10 show an increase of 21.9 points of certified robustness for eps = 1 with the L2 norm compared to the state of the art. This increase is extremely high and, in my opinion, suspicious. I took the time to check the code and it seems that the authors normalize the inputs of the model. The authors mention this in Appendix E.2 DETAILS OF DATASETS. After a review of the code, it seems that the authors use a function called
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Face and Expression Recognition · Machine Learning and Data Classification
