Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun; Jonas Henry Grebe; Hossein Shakibania; Anna Rohrbach; Marcus Rohrbach

arXiv:2605.19227·cs.CR·May 20, 2026

Token by Token, Compromised: Backdoor Vulnerabilities in Unified Autoregressive Models

Tobias Braun, Jonas Henry Grebe, Hossein Shakibania, Anna Rohrbach, Marcus Rohrbach

PDF

TL;DR

This paper introduces ToBAC, a novel backdoor attack on unified autoregressive models that can manipulate both text and image outputs, revealing new security vulnerabilities in multimodal AI systems.

Contribution

It is the first to demonstrate backdoor vulnerabilities in UAMs, showing how triggers can maliciously influence multimodal outputs through data and model poisoning strategies.

Findings

01

ToBAC successfully manipulates multimodal outputs with high success rates.

02

Innocuous words can serve as triggers for harmful content generation.

03

Model access enables targeted backdoor attacks with 55% success rate.

Abstract

Unified autoregressive models (UAMs) are transformer models that generate text as well as image tokens within a single autoregressive pass. Shared parameters and a multimodal vocabulary simplify the training pipeline and facilitate flexible multimodal generation, yet might introduce new vulnerabilities. In particular, we are the first to show that this unified architecture enables multimodal backdoor attacks, where a trigger can propagate malicious effects across multiple output modalities. Specifically, we present the Token by Token Backdoor Attack (ToBAC), the first backdoor attack targeting UAMs, exploring both data-based and model-based poisoning strategies. We demonstrate that innocuous characters or even common words can be transformed into triggers that elicit harmful behavior in autoregressive image generation. ToBAC can jointly manipulate visual outputs and accompanying text,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.