When LoRA Betrays: Backdooring Text-to-Image Models by Masquerading as Benign Adapters
Liangwei Lyu, Jiaqi Xu, Jianwei Ding, Qiyao Deng

TL;DR
This paper introduces Masquerade-LoRA, a novel backdoor attack framework exploiting LoRA modules in text-to-image models, achieving high success rates while remaining stealthy, highlighting security risks in model sharing ecosystems.
Contribution
It presents the first systematic attack method using independent LoRA modules to inject backdoors into text-to-image diffusion models, emphasizing new security concerns.
Findings
MasqLoRA achieves a 99.8% attack success rate.
The attack requires minimal resource overhead.
MasqLoRA remains stealthy by behaving normally without triggers.
Abstract
Low-Rank Adaptation (LoRA) has emerged as a leading technique for efficiently fine-tuning text-to-image diffusion models, and its widespread adoption on open-source platforms has fostered a vibrant culture of model sharing and customization. However, the same modular and plug-and-play flexibility that makes LoRA appealing also introduces a broader attack surface. To highlight this risk, we propose Masquerade-LoRA (MasqLoRA), the first systematic attack framework that leverages an independent LoRA module as the attack vehicle to stealthily inject malicious behavior into text-to-image diffusion models. MasqLoRA operates by freezing the base model parameters and updating only the low-rank adapter weights using a small number of "trigger word-target image" pairs. This enables the attacker to train a standalone backdoor LoRA module that embeds a hidden cross-modal mapping: when the module is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
