Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence
Ruoxi Chen, Haibo Jin, Haibin Zheng, Jinyin Chen, Zhenguang Liu

TL;DR
This paper introduces a novel defense method against adversarial attacks called Neuron-level Inverse Perturbation (NIP), which enhances model robustness by manipulating neuron influence to counteract attack effects.
Contribution
The paper proposes the concept of neuron influence and develops NIP, a proactive defense that modifies inputs based on neuron influence to improve robustness against various adversarial attacks.
Findings
NIP effectively strengthens neurons with larger influence.
NIP reduces the success rate of adversarial attacks.
Neuron influence correlates with attack success.
Abstract
The vulnerabilities of deep learning models towards adversarial attacks have attracted increasing attention, especially when models are deployed in security-critical domains. Numerous defense methods, including reactive and proactive ones, have been proposed for model robustness improvement. Reactive defenses, such as conducting transformations to remove perturbations, usually fail to handle large perturbations. The proactive defenses that involve retraining, suffer from the attack dependency and high computation cost. In this paper, we consider defense methods from the general effect of adversarial attacks that take on neurons inside the model. We introduce the concept of neuron influence, which can quantitatively measure neurons' contribution to correct classification. Then, we observe that almost all attacks fool the model by suppressing neurons with larger influence and enhancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
