TINA: Text-Free Inversion Attack for Unlearned Text-to-Image Diffusion Models
Qianlong Xiang, Miao Zhang, Haoyu Zhang, Kun Wang, Junhui Hou, Liqiang Nie

TL;DR
This paper introduces TINA, a text-free inversion attack that reveals residual visual knowledge of erased concepts in text-to-image diffusion models, exposing limitations of current unlearning methods.
Contribution
TINA is the first text-free inversion attack that bypasses text-centric defenses, revealing persistent visual knowledge in erased models.
Findings
TINA successfully regenerates erased concepts from unlearned models.
Current unlearning methods only obscure, not erase, concepts.
Visual knowledge persists despite text-based concept erasure.
Abstract
Although text-to-image diffusion models exhibit remarkable generative power, concept erasure techniques are essential for their safe deployment to prevent the creation of harmful content. This has fostered a dynamic interplay between the development of erasure defenses and the adversarial probes designed to bypass them, and this co-evolution has progressively enhanced the efficacy of erasure methods. However, this adversarial co-evolution has converged on a narrow, text-centric paradigm that equates erasure with severing the text-to-image mapping, ignoring that the underlying visual knowledge related to undesired concepts still persist. To substantiate this claim, we investigate from a visual perspective, leveraging DDIM inversion to probe whether a generative pathway for the erased concept can still be found. However, identifying such a visual generative pathway is challenging because…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection
