On Achieving Optimal Adversarial Test Error
Justin D. Li, Matus Telgarsky

TL;DR
This paper characterizes the properties of optimal adversarial predictors, establishes bounds relating different loss types, and proves that adversarial training with shallow networks and early stopping can achieve optimal adversarial test error.
Contribution
It provides a comprehensive theoretical analysis of optimal adversarial predictors and demonstrates that simple training procedures can attain optimal adversarial test error.
Findings
Optimal adversarial predictors have specific structural properties.
Adversarial training with early stopping can reach optimal adversarial error.
New Rademacher complexity bounds support the theoretical guarantees.
Abstract
We first elucidate various fundamental properties of optimal adversarial predictors: the structure of optimal adversarial convex predictors in terms of optimal adversarial zero-one predictors, bounds relating the adversarial convex loss to the adversarial zero-one loss, and the fact that continuous predictors can get arbitrarily close to the optimal adversarial error for both convex and zero-one losses. Applying these results along with new Rademacher complexity bounds for adversarial training near initialization, we prove that for general data distributions and perturbation sets, adversarial training on shallow networks with early stopping and an idealized optimal adversary is able to achieve optimal adversarial test error. By contrast, prior theoretical work either considered specialized data distributions or only provided training error guarantees.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
MethodsEarly Stopping
