Expressive Losses for Verified Robustness via Convex Combinations
Alessandro De Palma, Rudy Bunel, Krishnamurthy Dvijotham, M. Pawan, Kumar, Robert Stanforth, Alessio Lomuscio

TL;DR
This paper introduces the concept of expressive loss functions, formalized as convex combinations of bounds, to improve the trade-off between accuracy and robustness in verified adversarial training, achieving state-of-the-art results.
Contribution
It formalizes the expressivity of loss functions via convex combinations and demonstrates their effectiveness in enhancing robustness-accuracy trade-offs.
Findings
Convex combination losses achieve state-of-the-art robustness.
Expressivity of loss functions is crucial for performance.
Better worst-case loss approximations do not always improve robustness.
Abstract
In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of…
Peer Reviews
Decision·ICLR 2024 poster
- The motivation of the paper is sound, and the underlying theory regarding certified training remains unknown and challenging. - The paper is generally well-organized and easy to follow. - The experiments are comprehensive and different datasets and attack radii are used for the evaluation.
- My biggest concern is that the contribution and novelty of the paper are incremental and minor, which is about the expressivity of losses. However, it seems that it somehow borrows the idea of the previous work SABR, which gives an effective loss ranging from adversarial loss and verified loss. The difference between this work and SABR is not that clear and significant as SABR can induce expressivity by letting $\lambda=\alpha$ as shown in Sec. 3. - Some key details are not given in the main
The empirical results seem strong, especially considering the fact that the proposed methods are simple interpolations.
The presentation and writing needs a lot of work and it seems the paper is hurriedly written. Specific concerns are below. The mathematical definition of property P in Eq 1 is given in section 2 without any discussion of what it means or entails and why is it interesting/useful. You could add atleast one example of how x_adv could possibly be generated in the background section. Explicitly write down what “verification” means before using it in section 2.1. I don’t know what the follo
- The presentation is mostly good. - Training certifiable networks is a relevant research problem. - The ideas are conceptually simple yet seem to be effective. - The work unifies and generalizes successfull approaches. - The authors provided code.
- It remains unclear how stable the results are (for example w.r.t. different seeds). - Writing could be improved in some parts of the paper, i.e. Section 6.3. - It remains unclear what "tricks" i.e. for regularization and initialization where specifically used.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
