Towards Backdoor-Based Ownership Verification for Vision-Language-Action Models
Ming Sun, Rui Wang, Xingrui Yu, Lihua Jing, Hangyu Du, Zhenglin Wan, Xu Pan, Ivor Tsang

TL;DR
This paper introduces GuardVLA, a backdoor-based framework for verifying ownership of vision-language-action models, embedding secret watermarks during training and reliably detecting them post-release.
Contribution
It is the first to develop a backdoor watermarking method specifically for VLAs, ensuring secure ownership verification without compromising model performance.
Findings
GuardVLA reliably verifies ownership across multiple datasets and architectures.
The embedded watermark remains detectable after model adaptation.
GuardVLA preserves the benign task performance of models.
Abstract
Vision-Language-Action models (VLAs) support generalist robotic control by enabling end-to-end decision policies directly from multi-modal inputs. As trained VLAs are increasingly shared and adapted, protecting model ownership becomes essential for secure deployment and responsible open-source usage. In this paper, we present GuardVLA, the first backdoor-based ownership verification framework specifically designed for VLAs. GuardVLA embeds a stealthy and harmless backdoor watermark into the protected model during training by injecting secret messages into embodied visual data. For post-release verification, we propose a swap-and-detect mechanism, in which the trigger projector and an external classifier head are used to activate and detect the embedded backdoor based on prediction probabilities. Extensive experiments across multiple datasets, model architectures, and adaptation settings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
