On the Effectiveness of Small Input Noise for Defending Against Query-based Black-Box Attacks
Junyoung Byun, Hyojun Go, Changick Kim

TL;DR
Introducing Small Noise Defense (SND), a simple method that adds minor input noise to disrupt query-based black-box attacks on neural networks, effectively reducing attack success while preserving model accuracy and speed.
Contribution
Proposes SND, a novel defense mechanism that uses small input noise to hinder black-box adversarial attacks without significant performance loss.
Findings
SND neutralizes most query-based attacks on CIFAR-10 and ImageNet.
SND maintains high classification accuracy and computational speed.
SND is easily applicable to pre-trained models with minimal code changes.
Abstract
While deep neural networks show unprecedented performance in various tasks, the vulnerability to adversarial examples hinders their deployment in safety-critical systems. Many studies have shown that attacks are also possible even in a black-box setting where an adversary cannot access the target model's internal information. Most black-box attacks are based on queries, each of which obtains the target model's output for an input, and many recent studies focus on reducing the number of required queries. In this paper, we pay attention to an implicit assumption of query-based black-box adversarial attacks that the target model's output exactly corresponds to the query input. If some randomness is introduced into the model, it can break the assumption, and thus, query-based attacks may have tremendous difficulty in both gradient estimation and local search, which are the core of their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
On the Effectiveness of Small Input Noise for Defending Against Query-based Black-Box Attacks· youtube
Taxonomy
TopicsAdversarial Robustness in Machine Learning
