Salient Conditional Diffusion for Defending Against Backdoor Attacks

Brandon B. May; N. Joseph Tatro; Dylan Walker; Piyush Kumar; Nathan; Shnidman

arXiv:2301.13862·cs.LG·May 22, 2023

Salient Conditional Diffusion for Defending Against Backdoor Attacks

Brandon B. May, N. Joseph Tatro, Dylan Walker, Piyush Kumar, Nathan, Shnidman

PDF

Open Access

TL;DR

This paper introduces Salient Conditional Diffusion (Sancdifi), a black-box defense method using diffusion models and saliency maps to effectively remove backdoor triggers from poisoned images while preserving salient features.

Contribution

Sancdifi is a novel diffusion-based defense that leverages saliency maps to target backdoor triggers without needing model parameters, enhancing robustness against backdoor attacks.

Findings

01

Effectively removes backdoor triggers from poisoned data.

02

Preserves salient features in clean data.

03

Operates as a black-box defense without model access.

Abstract

We propose a novel algorithm, Salient Conditional Diffusion (Sancdifi), a state-of-the-art defense against backdoor attacks. Sancdifi uses a denoising diffusion probabilistic model (DDPM) to degrade an image with noise and then recover said image using the learned reverse diffusion. Critically, we compute saliency map-based masks to condition our diffusion, allowing for stronger diffusion on the most salient pixels by the DDPM. As a result, Sancdifi is highly effective at diffusing out triggers in data poisoned by backdoor attacks. At the same time, it reliably recovers salient features when applied to clean data. This performance is achieved without requiring access to the model parameters of the Trojan network, meaning Sancdifi operates as a black-box defense.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection

MethodsDiffusion