When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Kong

TL;DR
This paper analyzes poisoning attacks on textual inversion in diffusion models, introduces visualization tools and a new defense mechanism called Safe-Zone Training, which significantly improves robustness against various poisoning methods.
Contribution
It presents a systematic analysis of poisoning attack timing and location, introduces Semantic Sensitivity Maps, and proposes Safe-Zone Training as a novel defense strategy.
Findings
Poisoning attacks mainly target lower-timestep samples.
Safe-Zone Training significantly improves robustness against poisoning.
Adversarial signals distract learning from relevant concept regions.
Abstract
Poisoning attacks pose significant challenges to the robustness of diffusion models (DMs). In this paper, we systematically analyze when and where poisoning attacks textual inversion (TI), a widely used personalization technique for DMs. We first introduce Semantic Sensitivity Maps, a novel method for visualizing the influence of poisoning on text embeddings. Second, we identify and experimentally verify that DMs exhibit non-uniform learning behavior across timesteps, focusing on lower-noise samples. Poisoning attacks inherit this bias and inject adversarial signals predominantly at lower timesteps. Lastly, we observe that adversarial signals distract learning away from relevant concept regions within training data, corrupting the TI process. Based on these insights, we propose Safe-Zone Training (SZT), a novel defense mechanism comprised of 3 key components: (1) JPEG compression to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Misinformation and Its Impacts · Topic Modeling
MethodsDiffusion
