Causal-Guided Detoxify Backdoor Attack of Open-Weight LoRA Models
Linzhi Chen, Yang Sun, Hongru Wei, Yuqi Chen

TL;DR
This paper introduces Causal-Guided Detoxify Backdoor Attack (CBA), a novel method for stealthily implanting backdoors in open-weight LoRA models without access to training data, improving success rates and resistance to defenses.
Contribution
The paper presents CBA, a new backdoor attack framework for LoRA models that operates without training data, uses causal-guided neuron merging, and offers post-training control over attack strength.
Findings
Achieves high attack success rates across six LoRA models.
Reduces false trigger rate by 50-70% compared to baseline methods.
Demonstrates increased resistance to state-of-the-art defenses.
Abstract
Low-Rank Adaptation (LoRA) has emerged as an efficient method for fine-tuning large language models (LLMs) and is widely adopted within the open-source community. However, the decentralized dissemination of LoRA adapters through platforms such as Hugging Face introduces novel security vulnerabilities: malicious adapters can be easily distributed and evade conventional oversight mechanisms. Despite these risks, backdoor attacks targeting LoRA-based fine-tuning remain relatively underexplored. Existing backdoor attack strategies are ill-suited to this setting, as they often rely on inaccessible training data, fail to account for the structural properties unique to LoRA, or suffer from high false trigger rates (FTR), thereby compromising their stealth. To address these challenges, we propose Causal-Guided Detoxify Backdoor Attack (CBA), a novel backdoor attack framework specifically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Malware Detection Techniques
