Efficient and Robust Classification for Sparse Attacks
Mark Beliaev, Payam Delgosha, Hamed Hassani, Ramtin Pedarsani

TL;DR
This paper introduces a novel defense method combining truncation and adversarial training to improve the robustness of neural networks against sparse, $ ext{l}_0$-norm bounded attacks, with theoretical and empirical validation.
Contribution
It proposes a new robust classification approach for $ ext{l}_0$ attacks, including theoretical analysis and practical extensions to neural networks.
Findings
Theoretically proves asymptotic optimality in Gaussian mixture models.
Demonstrates significant reduction in robust classification error on MNIST and CIFAR datasets.
Shows improved robustness of neural networks against sparse perturbations.
Abstract
In the past two decades we have seen the popularity of neural networks increase in conjunction with their classification accuracy. Parallel to this, we have also witnessed how fragile the very same prediction models are: tiny perturbations to the inputs can cause misclassification errors throughout entire datasets. In this paper, we consider perturbations bounded by the --norm, which have been shown as effective attacks in the domains of image-recognition, natural language processing, and malware-detection. To this end, we propose a novel defense method that consists of "truncation" and "adversarial training". We then theoretically study the Gaussian mixture setting and prove the asymptotic optimality of our proposed classifier. Motivated by the insights we obtain, we extend these components to neural network classifiers. We conduct numerical experiments in the domain of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
