Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts
Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh, Phung

TL;DR
This paper presents a novel method for hiding sensitive or undesirable concepts in text-to-image diffusion models using learnable prompts, allowing controlled recovery and improved access control without degrading overall model performance.
Contribution
It introduces a learnable prompt mechanism for concept hiding and recovery in diffusion models, enabling flexible access control while preserving model capabilities.
Findings
Hiding concepts reduces risks associated with permanent removal.
Controlled recovery of hidden concepts is feasible with a secret key.
Model performance remains stable after concept hiding.
Abstract
Diffusion models have demonstrated remarkable capability in generating high-quality visual content from textual descriptions. However, since these models are trained on large-scale internet data, they inevitably learn undesirable concepts, such as sensitive content, copyrighted material, and harmful or unethical elements. While previous works focus on permanently removing such concepts, this approach is often impractical, as it can degrade model performance and lead to irreversible loss of information. In this work, we introduce a novel concept-hiding approach that makes unwanted concepts inaccessible to public users while allowing controlled recovery when needed. Instead of erasing knowledge from the model entirely, we incorporate a learnable prompt into the cross-attention module, acting as a secure memory that suppresses the generation of hidden concepts unless a secret key is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsDiffusion
