Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang; Bojun Yang; Shuo He; Weitong Chen; Wei Emma Zhang; Olaf Maennel; Lei Feng; Miao Xu

arXiv:2603.12989·cs.CV·March 16, 2026

Test-Time Attention Purification for Backdoored Large Vision Language Models

Zhifang Zhang, Bojun Yang, Shuo He, Weitong Chen, Wei Emma Zhang, Olaf Maennel, Lei Feng, Miao Xu

PDF

Open Access

TL;DR

This paper introduces CleanSight, a test-time defense method for large vision-language models that detects and neutralizes backdoor triggers by analyzing and pruning attention patterns, without retraining.

Contribution

The paper provides a mechanistic understanding of backdoor behaviors in LVLMs and proposes a novel, training-free, test-time defense method based on attention analysis.

Findings

01

CleanSight effectively detects poisoned inputs using attention ratios.

02

It neutralizes backdoors by pruning high-attention visual tokens.

03

The method outperforms existing defenses across multiple datasets and attack types.

Abstract

Despite the strong multimodal performance, large vision-language models (LVLMs) are vulnerable during fine-tuning to backdoor attacks, where adversaries insert trigger-embedded samples into the training data to implant behaviors that can be maliciously activated at test time. Existing defenses typically rely on retraining backdoored parameters (e.g., adapters or LoRA modules) with clean data, which is computationally expensive and often degrades model performance. In this work, we provide a new mechanistic understanding of backdoor behaviors in LVLMs: the trigger does not influence prediction through low-level visual patterns, but through abnormal cross-modal attention redistribution, where trigger-bearing visual tokens steal attention away from the textual context - a phenomenon we term attention stealing. Motivated by this, we propose CleanSight, a training-free, plug-and-play defense…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications