TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Weixin Chen; Dawn Song; Bo Li

arXiv:2303.05762·cs.LG·March 13, 2023·1 cites

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Weixin Chen, Dawn Song, Bo Li

PDF

Open Access 3 Repos

TL;DR

This paper introduces TrojDiff, a novel Trojan attack method on diffusion models that manipulates training to produce targeted malicious outputs, demonstrating high attack success while maintaining model performance in benign settings.

Contribution

The paper proposes TrojDiff, a new Trojan attack framework for diffusion models that optimizes adversarial diffusion and generative processes during training.

Findings

01

TrojDiff achieves high attack success rates across different targets and triggers.

02

The attack preserves the performance of diffusion models in benign environments.

03

TrojDiff is effective on CIFAR-10 and CelebA datasets against multiple diffusion model types.

Abstract

Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Adversarial Robustness in Machine Learning

MethodsDiffusion