Backdooring Textual Inversion for Concept Censorship
Yutong Wu, Jie Zhang, Florian Kerschbaum, and Tianwei Zhang

TL;DR
This paper introduces a novel method to censor sensitive concepts in Textual Inversion embeddings by injecting backdoors, enabling controlled image generation and preventing malicious misuse of personalized diffusion models.
Contribution
It proposes a backdoor-based concept censorship technique for Textual Inversion, enhancing safety in personalized diffusion models without fine-tuning.
Findings
Effective concept censorship demonstrated on Stable Diffusion
Backdoor triggers successfully prevent malicious concept generation
Model outputs are controlled with minimal impact on normal use
Abstract
Recent years have witnessed success in AIGC (AI Generated Content). People can make use of a pre-trained diffusion model to generate images of high quality or freely modify existing pictures with only prompts in nature language. More excitingly, the emerging personalization techniques make it feasible to create specific-desired images with only a few images as references. However, this induces severe threats if such advanced techniques are misused by malicious users, such as spreading fake news or defaming individual reputations. Thus, it is necessary to regulate personalization models (i.e., concept censorship) for their development and advancement. In this paper, we focus on the personalization technique dubbed Textual Inversion (TI), which is becoming prevailing for its lightweight nature and excellent performance. TI crafts the word embedding that contains detailed information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis
MethodsFocus · Diffusion
