Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework
Mengyu Sun, Ziyuan Yang, Zunlong Zhou, Junxu Liu, Haibo Hu, Yi Zhang

TL;DR
This paper introduces ConceptAgent, a black-box, multi-agent framework that awakens erased concepts in diffusion models by leveraging surrogate-guided noisy states, revealing limitations of current concept erasure techniques.
Contribution
It presents a novel, training-free approach for concept awakening in diffusion models under black-box constraints, using surrogate-guided trajectories.
Findings
ConceptAgent effectively awakens erased concepts in black-box diffusion models.
The method demonstrates controllable and accurate concept awakening without model access.
Results expose fundamental limitations in existing concept erasure methods.
Abstract
Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretrained models. However, recent studies show that such methods often suppress rather than fully eliminate target concepts, leaving models vulnerable to awakening attacks. Existing approaches primarily rely on white-box access through optimization or inversion, while concept awakening under black-box constraints remains underexplored. In this work, we revisit the denoising process from a trajectory perspective and show that concept erasure mainly disrupts early-stage text-semantic alignment but does not fully prevent semantic information from propagating along the denoising dynamics. As generation proceeds, the model increasingly depends on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
