Backdoor Defense through Self-Supervised and Generative Learning

Ivan Saboli\'c; Ivan Grubi\v{s}i\'c; Sini\v{s}a \v{S}egvi\'c

arXiv:2409.01185·cs.LG·September 5, 2024

Backdoor Defense through Self-Supervised and Generative Learning

Ivan Saboli\'c, Ivan Grubi\v{s}i\'c, Sini\v{s}a \v{S}egvi\'c

PDF

Open Access

TL;DR

This paper proposes a novel backdoor defense method using self-supervised and generative learning to detect and cleanse poisoned data, effectively reducing attack success while maintaining accuracy.

Contribution

It introduces a generative modeling approach in self-supervised space for backdoor detection, differing from traditional discriminative defenses.

Findings

01

Generative models detect poisoned data effectively.

02

Cleansed datasets significantly lower attack success rates.

03

Method preserves model accuracy on benign inputs.

Abstract

Backdoor attacks change a small portion of training data by introducing hand-crafted triggers and rewiring the corresponding labels towards a desired target class. Training on such data injects a backdoor which causes malicious inference in selected test samples. Most defenses mitigate such attacks through various modifications of the discriminative learning procedure. In contrast, this paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. Interestingly, these representations get either preserved or heavily disturbed under recent backdoor attacks. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset. Experiments show that training on cleansed dataset greatly reduces the attack success rate and retains the accuracy on benign inputs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBullying, Victimization, and Aggression