A Recipe for Improved Certifiable Robustness
Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson

TL;DR
This paper enhances certifiable robustness of neural networks by optimizing Lipschitz-based methods, introducing novel architectural components and data augmentation to significantly improve verification accuracy on benchmarks.
Contribution
The work introduces a comprehensive evaluation and novel design techniques, including Cholesky-orthogonalized residual layers, to advance Lipschitz-based certification methods.
Findings
Significant improvement in deterministic verification accuracy on benchmarks.
Introduction of Cholesky-orthogonalized residual layers enhances network capacity.
Up to 8.5 percentage points increase in verification accuracy achieved.
Abstract
Recent studies have highlighted the potential of Lipschitz-based methods for training certifiably robust neural networks against adversarial attacks. A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training. However, effectively adding capacity under stringent Lipschitz constraints has proven more difficult than it may seem, evident by the fact that state-of-the-art approach tend more towards \emph{underfitting} than overfitting. Moreover, we posit that a lack of careful exploration of the design space for Lipshitz-based approaches has left potential performance gains on the table. In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. Using a combination of novel techniques, design optimizations, and synthesis of…
Peer Reviews
Decision·ICLR 2024 poster
* By combining technical improvements on three aspects as mentioned in the summary, the paper shows a significant empirical improvement over previous works across all the datasets (e.g., +8% on CIFAR-10). * This work provides suggestions on better settings for the robust training, in terms of model architecture with additional layers, building orthogonal layers with Cholesky decomposition, and data augmentation with a newer diffusion model.
* The paper looks like manually searching for settings (model architecture, orthogonal layers, diffusion model). It has engineering merits. But it does not have much novel contribution by adding more dense layers and replacing the diffusion model already used in Hu et al,, 2023 with a newer diffusion model. * The benefits of the best choices found by the paper are not well explained. For example, the paper only explains that the Cholesky-base orthogonalization is more numerically stable and fast
1. This work studies the limitation for Lipschitz-based certification and proposed new architectures to mitigate the issue. 2. Strong empirical result: experiments showed noticeable improvement over the baseline models.
The authors need to include some intuitions when designing the layers.
It finds that an apparent limitation preventing prior work from discovering the full potential of Lipschitz-based certification stems from the framing and evaluation setup. Specifically, most prior work is framed around a particular novel technique intended to supersede the state-of-the-art, necessitating evaluations centered on standardized benchmark hyperparameter design spaces, rather than exploring more general methods for improving performance (e.g., architecture choice, data pipeline, etc.
In section 4.3, it seems to mainly discuss the comparison with RS based methods. But Table 5 shows several other works which can achieve better performance. It is better to also discuss the comparison with these works. Currently it seems that table 5 only shows the results without detailed discussions for these works.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing
MethodsAverage Pooling · Kaiming Initialization · 1x1 Convolution · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Residual Block · Residual Connection · Global Average Pooling · Max Pooling
