Characterizing the Optimal 0-1 Loss for Multi-class Classification with a Test-time Attacker
Sihui Dai, Wenxin Ding, Arjun Nitin Bhagoji, Daniel Cullina, Ben Y., Zhao, Haitao Zheng, Prateek Mittal

TL;DR
This paper establishes theoretical lower bounds on the 0-1 loss for multi-class classifiers under test-time adversarial attacks, providing a framework to evaluate and compare classifier robustness.
Contribution
It introduces a novel hypergraph-based framework to determine the optimal 0-1 loss under adversarial constraints for multi-class classification.
Findings
First analysis of the gap to optimal robustness on benchmark datasets.
Framework applicable to any discrete dataset and threat model.
Provides bounds that serve as diagnostics for classifier robustness.
Abstract
Finding classifiers robust to adversarial examples is critical for their safe deployment. Determining the robustness of the best possible classifier under a given threat model for a given data distribution and comparing it to that achieved by state-of-the-art training methods is thus an important diagnostic tool. In this paper, we find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset. We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints. We further define other variants of the attacker-classifier game that determine the range of the optimal loss more efficiently than the full-fledged hypergraph construction. Our evaluation shows, for the first time, an analysis of the gap to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research
