Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano; Ekaterina Vasyagina; Kevin Zhu; Javier Ferrando; Maheep Chaudhary

arXiv:2602.15195·cs.CR·April 8, 2026

Weight space Detection of Backdoors in LoRA Adapters

David Puertolas Merenciano, Ekaterina Vasyagina, Kevin Zhu, Javier Ferrando, Maheep Chaudhary

PDF

TL;DR

This paper introduces a trigger-agnostic method to detect backdoored LoRA adapters by analyzing their weight matrices directly, achieving perfect accuracy across multiple model families without model execution.

Contribution

The authors propose a novel spectral analysis technique for weight matrices that enables effective backdoor detection in LoRA adapters without needing test data or model inference.

Findings

01

Detector achieves 100% accuracy across three model families.

02

Method is trigger-agnostic and does not require running the model.

03

Effective on adapters from diverse tasks like reasoning and classification.

Abstract

LoRA adapters let users fine-tune large language models (LLMs) efficiently. However, LoRA adapters are shared through open repositories like Hugging Face Hub \citep{huggingface_hub_docs}, making them vulnerable to backdoor attacks. Current detection methods require running the model with test input data -- making them impractical for screening thousands of adapters where the trigger for backdoor behavior is unknown. We detect poisoned adapters by analyzing their weight matrices directly, without running the model -- making our method trigger-agnostic. For each attention projection (Q, K, V, O), our method extracts five spectral statistics from the low-rank update $Δ W$ , yielding a 20-dimensional signature for each adapter. A logistic regression detector trained on this representation separates benign and poisoned adapters across three model families -- Llama-3.2-3B~\citep{llama3},…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.