Concept Arithmetics for Circumventing Concept Inhibition in Diffusion   Models

Vitali Petsiuk; Kate Saenko

arXiv:2404.13706·cs.CV·April 23, 2024

Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models

Vitali Petsiuk, Kate Saenko

PDF

Open Access

TL;DR

This paper reveals how adversaries can bypass safety measures in diffusion models by using concept arithmetics and compositional inference, highlighting vulnerabilities in current safety mechanisms.

Contribution

It introduces a novel attack method leveraging concept arithmetics to reconstruct sensitive concepts, exposing potential safety flaws in diffusion models.

Findings

01

Proves the feasibility of concept arithmetics attacks both theoretically and empirically.

02

Demonstrates how multiple prompts can be combined to reconstruct target concepts.

03

Discusses implications for designing safer diffusion models.

Abstract

Motivated by ethical and legal concerns, the scientific community is actively developing methods to limit the misuse of Text-to-Image diffusion models for reproducing copyrighted, violent, explicit, or personal information in the generated images. Simultaneously, researchers put these newly developed safety measures to the test by assuming the role of an adversary to find vulnerabilities and backdoors in them. We use compositional property of diffusion models, which allows to leverage multiple prompts in a single image generation. This property allows us to combine other concepts, that should not have been affected by the inhibition, to reconstruct the vector, responsible for target concept generation, even though the direct computation of this vector is no longer accessible. We provide theoretical and empirical evidence why the proposed attacks are possible and discuss the implications…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference

MethodsDiffusion