Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via   Learnable Prompts

Anh Bui; Khanh Doan; Trung Le; Paul Montague; Tamas Abraham; Dinh; Phung

arXiv:2403.12326·cs.LG·February 18, 2025·1 cites

Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts

Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh, Phung

PDF

Open Access 2 Repos

TL;DR

This paper presents a novel method for hiding sensitive or undesirable concepts in text-to-image diffusion models using learnable prompts, allowing controlled recovery and improved access control without degrading overall model performance.

Contribution

It introduces a learnable prompt mechanism for concept hiding and recovery in diffusion models, enabling flexible access control while preserving model capabilities.

Findings

01

Hiding concepts reduces risks associated with permanent removal.

02

Controlled recovery of hidden concepts is feasible with a secret key.

03

Model performance remains stable after concept hiding.

Abstract

Diffusion models have demonstrated remarkable capability in generating high-quality visual content from textual descriptions. However, since these models are trained on large-scale internet data, they inevitably learn undesirable concepts, such as sensitive content, copyrighted material, and harmful or unethical elements. While previous works focus on permanently removing such concepts, this approach is often impractical, as it can degrade model performance and lead to irreversible loss of information. In this work, we introduce a novel concept-hiding approach that makes unwanted concepts inaccessible to public users while allowing controlled recovery when needed. Instead of erasing knowledge from the model entirely, we incorporate a learnable prompt into the cross-attention module, acting as a secure memory that suppresses the generation of hidden concepts unless a secret key is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsDiffusion