Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework

Mengyu Sun; Ziyuan Yang; Zunlong Zhou; Junxu Liu; Haibo Hu; Yi Zhang

arXiv:2605.18150·cs.AI·May 19, 2026

Whispers in the Noise: Surrogate-Guided Concept Awakening via a Multi-Agent Framework

Mengyu Sun, Ziyuan Yang, Zunlong Zhou, Junxu Liu, Haibo Hu, Yi Zhang

PDF

TL;DR

This paper introduces ConceptAgent, a black-box, multi-agent framework that awakens erased concepts in diffusion models by leveraging surrogate-guided noisy states, revealing limitations of current concept erasure techniques.

Contribution

It presents a novel, training-free approach for concept awakening in diffusion models under black-box constraints, using surrogate-guided trajectories.

Findings

01

ConceptAgent effectively awakens erased concepts in black-box diffusion models.

02

The method demonstrates controllable and accurate concept awakening without model access.

03

Results expose fundamental limitations in existing concept erasure methods.

Abstract

Diffusion models (DMs) are widely used for text-to-image generation, but their strong generative capabilities also raise concerns about unsafe or undesirable content. Concept erasure aims to mitigate these risks by removing specific concepts from pretrained models. However, recent studies show that such methods often suppress rather than fully eliminate target concepts, leaving models vulnerable to awakening attacks. Existing approaches primarily rely on white-box access through optimization or inversion, while concept awakening under black-box constraints remains underexplored. In this work, we revisit the denoising process from a trajectory perspective and show that concept erasure mainly disrupts early-stage text-semantic alignment but does not fully prevent semantic information from propagating along the denoising dynamics. As generation proceeds, the model increasingly depends on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.