BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models
Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian

TL;DR
This paper introduces BAGM, a novel backdoor attack targeting text-to-image generative models, demonstrating how manipulative content can be subtly embedded without degrading model utility, raising security concerns.
Contribution
The paper presents the first backdoor attack framework for text-to-image models, including a comprehensive set of metrics and a dataset for evaluation.
Findings
BAGM significantly increases bias towards target outputs.
The attack maintains model robustness and content utility.
Effective across multiple generative model stages.
Abstract
The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
MethodsDiffusion
