Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models
Tobias Braun, Jonas Henry Grebe, Hossein Shakibania, Anna Rohrbach, Marcus Rohrbach

TL;DR
This paper introduces ToBAC, a novel backdoor attack on unified autoregressive models that can manipulate both text and image outputs, revealing new security vulnerabilities in multimodal AI systems.
Contribution
It is the first to demonstrate backdoor vulnerabilities in UAMs, showing how triggers can maliciously influence multimodal outputs through data and model poisoning strategies.
Findings
ToBAC successfully manipulates multimodal outputs with high success rates.
Innocuous words can serve as triggers for harmful content generation.
Model access enables targeted backdoor attacks with 55% success rate.
Abstract
Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared parameters and a multimodal vocabulary simplify the training pipeline and facilitate flexible multimodal generation, yet might introduce new vulnerabilities. In particular, we are the first to show that this unified architecture enables multimodal backdoor attacks, where a trigger can propagate malicious effects across multiple output modalities. Specifically, we present the Token by Token Backdoor Attack (ToBAC), the first backdoor attack targeting UAMs, exploring both data-based and model-based poisoning strategies. We demonstrate that innocuous characters or even common words can be transformed into triggers that elicit harmful behavior in autoregressive image generation. ToBAC can jointly manipulate visual outputs and accompanying text,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
