The Consistency of Adversarial Training for Binary Classification
Natalie S. Frank, Jonathan Niles-Weed

TL;DR
This paper investigates the statistical consistency of adversarial training methods for binary classification, providing theoretical insights and bounds that clarify when these methods reliably approximate true adversarial robustness.
Contribution
It characterizes which supremum-based surrogate risks are consistent in adversarial settings and establishes quantitative bounds linking surrogate risks to true adversarial risks.
Findings
Identifies conditions for surrogate risk consistency in adversarial training
Provides bounds relating surrogate risks to actual adversarial risks
Discusses implications for the $\\cH$-consistency of adversarial training
Abstract
Robustness to adversarial perturbations is of paramount concern in modern machine learning. One of the state-of-the-art methods for training robust classifiers is adversarial training, which involves minimizing a supremum-based surrogate risk. The statistical consistency of surrogate risks is well understood in the context of standard machine learning, but not in the adversarial setting. In this paper, we characterize which supremum-based surrogates are consistent for distributions absolutely continuous with respect to Lebesgue measure in binary classification. Furthermore, we obtain quantitative bounds relating adversarial surrogate risks to the adversarial classification risk. Lastly, we discuss implications for the -consistency of adversarial training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Machine Learning and Algorithms · Anomaly Detection Techniques and Applications
