Combinational Backdoor Attack against Customized Text-to-Image Models

Wenbo Jiang; Jiaming He; Hongwei Li; Rui Zhang; Hanxiao Chen; Meng Hao; Haomiao Yang; Qingchuan Zhao; Guowen Xu

arXiv:2411.12389·cs.CR·September 24, 2025

Combinational Backdoor Attack against Customized Text-to-Image Models

Wenbo Jiang, Jiaming He, Hongwei Li, Rui Zhang, Hanxiao Chen, Meng Hao, Haomiao Yang, Qingchuan Zhao, Guowen Xu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces CBACT2I, a novel backdoor attack targeting customized Text-to-Image models by embedding backdoors separately into text encoders and diffusion models, making the attack more stealthy and controllable.

Contribution

It proposes a new combinational backdoor attack method that embeds backdoors separately into components of T2I models, enhancing stealthiness and controllability.

Findings

01

High effectiveness across various triggers and targets

02

Strong generality on different model combinations

03

High stealthiness against detection methods

Abstract

Recently, Text-to-Image (T2I) synthesis technology has made tremendous strides. Numerous representative T2I models have emerged and achieved promising application outcomes, such as DALL-E, Stable Diffusion, Imagen, etc. In practice, it has become increasingly popular for model developers to selectively adopt personalized pre-trained text encoders and conditional diffusion models from third-party platforms, integrating them together to build customized (personalized) T2I models. However, such an adoption approach is vulnerable to backdoor attacks. In this work, we propose a \textbf{C}ombinational \textbf{B}ackdoor \textbf{A}ttack against \textbf{C}ustomized \textbf{T2I} models (CBACT2I) targeting this application scenario. Different from previous backdoor attacks against T2I models, CBACT2I embeds the backdoor into the text encoder and the conditional diffusion model separately. The…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

1. This paper proposes a novel attack, where the backdoor can only be triggered when the text encoder matches the diffusion model. 2. The experiments are both sound and comprehensive, which demonstrates the effectiveness of the proposed method as well as its robustness. 3. The proposed method is straightforward, simple, and effective. 4. Good writing, easy to follow.

Weaknesses

1. **Scope of generalization.** Experiments focus on a few open-source diffusion models, and all of them are variants of stable diffusion model family; transferability to other architectures, tokenizers, or deployed commercial stacks (closed-source encoders/decoders) is not shown. I therefore recommend more experiments on different text encoders and diffusion models, including the newest SD models and the earliest LDM, whose text encoder is based on BERT. 2. **Limited defense evaluation.** Only

Reviewer 02Rating 2Confidence 4

Strengths

This work focuses on the backdoor attack in text-to-image tasks, which is a significant security threat, and proposes a novel threat scenario: "Combinational Backdoor Attack."

Weaknesses

***1. Unclear Threat Model*** The threat model is somewhat confusing. I understand that the authors aim to jointly tamper with two components (the text encoder and the UNet) to enhance the stealthiness of the backdoor attack. However, this setup raises several concerns: (1) How often does such a co-usage scenario occur in real-world settings? As far as I know, on open-source platforms like CivitAI, personalized fine-tuning of text encoders is rare; most community models focus on VAE or UNet mod

Reviewer 03Rating 8Confidence 5

Strengths

1. New attack surface: This paper introduces a novel backdoor attack in text-to-image models, considering the combinations of text encoders and conditional diffusion models. 2. High effectiveness and generality: This paper conducted comprehensive experiments to demonstrate the attack effectiveness with different backdoor triggers and backdoor targets the strong generality on different combinations of customized text encoders and diffusion models. 3. Defenses discussion: This paper conducted exte

Weaknesses

1. The ASR for “style backdoor target” depends on a simple classifier. The style-ASR is computed via a ResNet-18 trained by the authors (98% acc.), which may introduce bias. Since GPT4o-as-a-judge is introduced in the case study in the real-world scenario, it is suggested also employ GPT4o to judge the ASR of “style backdoor target”. 2. The idea of using CBACT2I for secret information hiding is interesting. However, there is no experimental validation for the "secret hiding" application. The aut

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning