Backdooring Textual Inversion for Concept Censorship

Yutong Wu; Jie Zhang; Florian Kerschbaum; and Tianwei Zhang

arXiv:2308.10718·cs.CR·August 24, 2023·1 cites

Backdooring Textual Inversion for Concept Censorship

Yutong Wu, Jie Zhang, Florian Kerschbaum, and Tianwei Zhang

PDF

Open Access

TL;DR

This paper introduces a novel method to censor sensitive concepts in Textual Inversion embeddings by injecting backdoors, enabling controlled image generation and preventing malicious misuse of personalized diffusion models.

Contribution

It proposes a backdoor-based concept censorship technique for Textual Inversion, enhancing safety in personalized diffusion models without fine-tuning.

Findings

01

Effective concept censorship demonstrated on Stable Diffusion

02

Backdoor triggers successfully prevent malicious concept generation

03

Model outputs are controlled with minimal impact on normal use

Abstract

Recent years have witnessed success in AIGC (AI Generated Content). People can make use of a pre-trained diffusion model to generate images of high quality or freely modify existing pictures with only prompts in nature language. More excitingly, the emerging personalization techniques make it feasible to create specific-desired images with only a few images as references. However, this induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (i.e., concept censorship) for their development and advancement. In this paper, we focus on the personalization technique dubbed Textual Inversion (TI), which is becoming prevailing for its lightweight nature and excellent performance. TI crafts the word embedding that contains detailed information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis

MethodsFocus · Diffusion