LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model
Marcel Mateos Salles, Praney Goyal, Pradyut Sekhsaria, Hai Huang, Randall Balestriero

TL;DR
This paper reveals that Low-Rank Adaptation (LoRA) finetuning makes language models vulnerable to manipulation through spurious tokens, posing significant risks for AI safety and data integrity.
Contribution
The study introduces SSTI, a novel method to demonstrate how minimal spurious tokens can manipulate LoRA-finetuned models, highlighting security vulnerabilities.
Findings
LoRA models are highly sensitive to single spurious tokens.
Existing data sanitization methods fail to detect SSTI.
Vulnerabilities increase with resource-efficient LoRA setups.
Abstract
Large Language Models (LLMs) are commonly finetuned for a variety of use cases and domains. A common approach is to leverage Low-Rank Adaptation (LoRA) -- known to provide strong performance at low resource costs. In this study, we demonstrate that LoRA actually opens the door to short-cut vulnerabilities -- and the more resource efficient is the LoRA setup, the more vulnerable will be the finetuned model to aggressive attacks. To measure that vulnerability, we introduce Seamless Spurious Token Injection (SSTI), where we find that LoRA exclusively focuses on even just a single token that is spuriously correlated with downstream labels. In short, injection of that spurious token during finetuning ensure that the model's prediction at test-time can be manipulated on-demand. We conducted experiments across model families and datasets to evaluate the impact of SSTI during LoRA finetuning…
Peer Reviews
Decision·Submitted to ICLR 2026
1. Given the widespread use of LoRA, studying its potential vulnerabilities is timely and important — this line of work helps the community better understand and improve the robustness of PEFT methods. 2. The paper is generally well written and easy to follow. The presentation makes the main ideas accessible. 3. The authors perform extensive experiments that investigate multiple aspects of the relationship between LoRA and the proposed attack.
1. Novelty & relation to backdoor attacks. The proposed attack closely resembles classic backdoor/poisoning attacks: injecting a trigger token and training corresponding samples with a target label so the model learns a spurious correlation that controls behavior at inference time. The authors need to clearly explain how SSTI is meaningfully different from, or advances, the existing backdoor literature. Also this shortcut/spurious correlation phenomenon is well studied in the backdoor attack pa
1. The research topic is important. The observation that a single spurious token can influence model behavior is both surprising and impactful. 2. The paper is clearly written and easy to follow. Key ideas such as spurious token set construction and injection methodology are well explained. 3. The evaluation is thorough. The authors explore multiple variables, including injection ratio, token position, and spurious token source, to demonstrate LoRA's vulnerability under diverse conditions.
1. The threat model needs further clarification. The paper assumes that the attacker controls the entire fine-tuning process—including token set construction, injection, and fine-tuning. However, in practice, such full control is rare. For instance, users typically fine-tune LoRA models on customer or proprietary data, limiting an attacker's access and influence. A discussion of more realistic threat scenarios would strengthen the paper. 2. The core finding is that LoRA is prone to overfitting
-The paper is well-written and logically structured; the problem, methodology, and experimental design are easy to follow. -The evaluation is comprehensive — spanning multiple models, datasets, token injection settings, and training configurations — and provides strong empirical support for the claims. -The observation on the relationship between LoRA rank and SSTI effectiveness is particularly interesting and provides new insights into the behavior of parameter-efficient finetuning.
+ **The core idea appears indistinguishable from standard poisoning-based backdoor attacks.** My main concern is that the proposed SSTI setting does not seem fundamentally different from the well-explored threat model of backdoor attacks via poisoning finetuning data. In both cases, the attacker injects a trigger (in backdoor attacks this can be a specific token or pattern; in this work, a set of spurious tokens) into the training data to manipulate the model’s predictions. The only practica
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Robotics and Automated Systems
