Fight Perturbations with Perturbations: Defending Adversarial Attacks   via Neuron Influence

Ruoxi Chen; Haibo Jin; Haibin Zheng; Jinyin Chen; Zhenguang Liu

arXiv:2112.13060·cs.CV·August 21, 2024

Fight Perturbations with Perturbations: Defending Adversarial Attacks via Neuron Influence

Ruoxi Chen, Haibo Jin, Haibin Zheng, Jinyin Chen, Zhenguang Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel defense method against adversarial attacks called Neuron-level Inverse Perturbation (NIP), which enhances model robustness by manipulating neuron influence to counteract attack effects.

Contribution

The paper proposes the concept of neuron influence and develops NIP, a proactive defense that modifies inputs based on neuron influence to improve robustness against various adversarial attacks.

Findings

01

NIP effectively strengthens neurons with larger influence.

02

NIP reduces the success rate of adversarial attacks.

03

Neuron influence correlates with attack success.

Abstract

The vulnerabilities of deep learning models towards adversarial attacks have attracted increasing attention, especially when models are deployed in security-critical domains. Numerous defense methods, including reactive and proactive ones, have been proposed for model robustness improvement. Reactive defenses, such as conducting transformations to remove perturbations, usually fail to handle large perturbations. The proactive defenses that involve retraining, suffer from the attack dependency and high computation cost. In this paper, we consider defense methods from the general effect of adversarial attacks that take on neurons inside the model. We introduce the concept of neuron influence, which can quantitatively measure neurons' contribution to correct classification. Then, we observe that almost all attacks fool the model by suppressing neurons with larger influence and enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Allen-piexl/NIP-Neuron-level-Inverse-Perturbation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning