Divide, Denoise, and Defend against Adversarial Attacks
Seyed-Mohsen Moosavi-Dezfooli, Ashish Shrivastava, Oncel Tuzel

TL;DR
The paper introduces D3, a non-differentiable image denoising method that divides images into patches to improve neural network robustness against adversarial attacks without fine-tuning.
Contribution
It proposes a novel defense mechanism called D3 that enhances robustness by patch-wise denoising, avoiding gradient-based attacks and fine-tuning, with strong empirical results on ImageNet.
Findings
Outperforms state-of-the-art by 19.7% in grey-box settings
Achieves 34.4% accuracy in white-box attacks
Performs comparably to top methods in black-box settings
Abstract
Deep neural networks, although shown to be a successful class of machine learning algorithms, are known to be extremely unstable to adversarial perturbations. Improving the robustness of neural networks against these attacks is important, especially for security-critical applications. To defend against such attacks, we propose dividing the input image into multiple patches, denoising each patch independently, and reconstructing the image, without losing significant image content. We call our method D3. This proposed defense mechanism is non-differentiable which makes it non-trivial for an adversary to apply gradient-based attacks. Moreover, we do not fine-tune the network with adversarial examples, making it more robust against unknown attacks. We present an analysis of the tradeoff between accuracy and robustness against adversarial attacks. We evaluate our method under black-box,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
