Can Adversarial Weight Perturbations Inject Neural Backdoors?

Siddhant Garg; Adarsh Kumar; Vibhor Goel; Yingyu Liang

arXiv:2008.01761·cs.LG·September 22, 2020

Can Adversarial Weight Perturbations Inject Neural Backdoors?

Siddhant Garg, Adarsh Kumar, Vibhor Goel, Yingyu Liang

PDF

1 Repo

TL;DR

This paper explores a novel security threat where adversarial weight perturbations can inject backdoors into trained neural networks, enabling targeted misbehavior with minimal weight changes across vision and NLP tasks.

Contribution

It introduces the concept of adversarial weight perturbations for backdoor injection, extending the traditional input-space adversarial attacks to model weights, and demonstrates their effectiveness empirically.

Findings

01

Backdoors can be injected with minimal weight changes.

02

Adversarial weight perturbations are effective across vision and NLP tasks.

03

Universal existence of such perturbations in trained models.

Abstract

Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an $ℓ_{\infty}$ norm around the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

goel96vibhor/AdvWeightPerturbations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.