Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models

Hao Cheng; Erjia Xiao; Yichi Wang; Chengyuan Yu; Mengshu Sun; Qiang Zhang; Jiahang Cao; Yijie Guo; Ning Liu; Kaidi Xu; Jize Zhang; Chao Shen; Philip Torr; Jindong Gu; Renjing Xu

arXiv:2409.13174·cs.CV·November 6, 2025

Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models

Hao Cheng, Erjia Xiao, Yichi Wang, Chengyuan Yu, Mengshu Sun, Qiang Zhang, Jiahang Cao, Yijie Guo, Ning Liu, Kaidi Xu, Jize Zhang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu

PDF

Open Access

TL;DR

This paper evaluates the physical robustness of Vision Language Action Models (VLAMs) against threats like adversarial attacks and out-of-distribution inputs, highlighting vulnerabilities in robotic manipulation tasks.

Contribution

It introduces the Physical Vulnerability Evaluating Pipeline (PVEP) for comprehensive assessment of VLAMs' robustness to physical threats.

Findings

01

VLAMs show significant performance drops under adversarial attacks.

02

PVEP effectively identifies vulnerabilities in visual and physical robustness.

03

Analysis guides future improvements in safe robotic manipulation.

Abstract

Recently, driven by advancements in Multimodal Large Language Models (MLLMs), Vision Language Action Models (VLAMs) are being proposed to achieve better performance in open-vocabulary scenarios for robotic manipulation tasks. Since manipulation tasks involve direct interaction with the physical world, ensuring robustness and safety during the execution of this task is always a very critical issue. In this paper, by synthesizing current safety research on MLLMs and the specific application scenarios of the manipulation task in the physical world, we comprehensively evaluate VLAMs in the face of potential physical threats. Specifically, we propose the Physical Vulnerability Evaluating Pipeline (PVEP) that can incorporate as many visual modal physical threats as possible for evaluating the physical robustness of VLAMs. The physical threats in PVEP specifically include Out-of-Distribution,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning