When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens

Xuetao Li; Pinhan Fu; Wenke Huang; Nengyuan Pan; Songhua Yang; Kaiyan Zhao; Guancheng Wan; Mengde Li; Jifeng Xuan; Miao Li

arXiv:2602.03153·cs.RO·February 4, 2026

When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens

Xuetao Li, Pinhan Fu, Wenke Huang, Nengyuan Pan, Songhua Yang, Kaiyan Zhao, Guancheng Wan, Mengde Li, Jifeng Xuan, Miao Li

PDF

Open Access

TL;DR

This paper introduces Bera, a novel test-time defense for vision-language-action models that detects and erases backdoors by analyzing attention patterns, without retraining, thereby enhancing robotic system security.

Contribution

Bera leverages deep-layer attention cues to detect and reconstruct backdoored visual tokens at test time, avoiding costly retraining and improving robustness against backdoor attacks.

Findings

01

Bera effectively reduces attack success rates across multiple platforms.

02

It maintains nominal task performance while erasing backdoors.

03

Bera does not require retraining or training pipeline modifications.

Abstract

Downstream fine-tuning of vision-language-action (VLA) models enhances robotics, yet exposes the pipeline to backdoor risks. Attackers can pretrain VLAs on poisoned data to implant backdoors that remain stealthy but can trigger harmful behavior during inference. However, existing defenses either lack mechanistic insight into multimodal backdoors or impose prohibitive computational costs via full-model retraining. To this end, we uncover a deep-layer attention grabbing mechanism: backdoors redirect late-stage attention and form compact embedding clusters near the clean manifold. Leveraging this insight, we introduce Bera, a test-time backdoor erasure framework that detects tokens with anomalous attention via latent-space localization, masks suspicious regions using deep-layer cues, and reconstructs a trigger-free image to break the trigger-unsafe-action mapping while restoring correct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning