Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks   Without an Accuracy Tradeoff

Eitan Borgnia; Valeriia Cherepanova; Liam Fowl; Amin Ghiasi; Jonas; Geiping; Micah Goldblum; Tom Goldstein; Arjun Gupta

arXiv:2011.09527·cs.CR·November 20, 2020

Strong Data Augmentation Sanitizes Poisoning and Backdoor Attacks Without an Accuracy Tradeoff

Eitan Borgnia, Valeriia Cherepanova, Liam Fowl, Amin Ghiasi, Jonas, Geiping, Micah Goldblum, Tom Goldstein, Arjun Gupta

PDF

1 Repo

TL;DR

This paper demonstrates that strong data augmentation techniques like mixup and CutMix can effectively defend against data poisoning and backdoor attacks without sacrificing model accuracy, outperforming traditional defenses.

Contribution

It introduces the use of data augmentation as a simple yet powerful defense mechanism against poisoning and backdoor attacks, with comprehensive validation.

Findings

01

Mixup and CutMix reduce attack success rates significantly.

02

CutMix increases validation accuracy by 9%.

03

Augmentation-based defenses outperform DP-SGD in robustness.

Abstract

Data poisoning and backdoor attacks manipulate victim models by maliciously modifying training data. In light of this growing threat, a recent survey of industry professionals revealed heightened fear in the private sector regarding data poisoning. Many previous defenses against poisoning either fail in the face of increasingly strong attacks, or they significantly degrade performance. However, we find that strong data augmentations, such as mixup and CutMix, can significantly diminish the threat of poisoning and backdoor attacks without trading off performance. We further verify the effectiveness of this simple defense against adaptive poisoning methods, and we compare to baselines including the popular differentially private SGD (DP-SGD) defense. In the context of backdoors, CutMix greatly mitigates the attack while simultaneously increasing validation accuracy by 9%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JonasGeiping/data-poisoning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMixup · Stochastic Gradient Descent · CutMix