Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li; Jun Li; Yuzhe Sha; Ziqiang Li; Lizhi Xiong; Zhangjie Fu

arXiv:2604.12446·cs.CR·April 15, 2026

Scaling Exposes the Trigger: Input-Level Backdoor Detection in Text-to-Image Diffusion Models via Cross-Attention Scaling

Zida Li, Jun Li, Yuzhe Sha, Ziqiang Li, Lizhi Xiong, Zhangjie Fu

PDF

TL;DR

This paper introduces SET, a novel input-level backdoor detection method for text-to-image diffusion models that exploits differences in response patterns under cross-attention scaling, outperforming existing defenses especially against stealthy triggers.

Contribution

The work uncovers the Cross-Attention Scaling Response Divergence phenomenon and develops SET, a trigger-agnostic detection framework that learns a benign response space for robust backdoor detection.

Findings

01

SET outperforms existing methods across diverse attack scenarios.

02

Achieves 9.1% higher AUROC and 6.5% higher ACC than the best baseline.

03

Effective against stealthy, implicit-trigger backdoor attacks.

Abstract

Text-to-image (T2I) diffusion models have achieved remarkable success in image synthesis, but their reliance on large-scale data and open ecosystems introduces serious backdoor security risks. Existing defenses, particularly input-level methods, are more practical for deployment but often rely on observable anomalies that become unreliable under stealthy, semantics-preserving trigger designs. As modern backdoor attacks increasingly embed triggers into natural inputs, these methods degrade substantially, raising a critical question: can more stable, implicit, and trigger-agnostic differences between benign and backdoor inputs be exploited for detection? In this work, we address this challenge from an active probing perspective. We introduce controlled scaling perturbations on cross-attention and uncover a novel phenomenon termed Cross-Attention Scaling Response Divergence (CSRD), where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.