TL;DR
SABER is a black-box attack framework that generates minimal, effective instruction perturbations to significantly degrade the performance of vision-language-action models in robotic tasks.
Contribution
It introduces an agent-centric, black-box adversarial attack method using a GRPO-trained ReAct attacker, outperforming GPT-based baselines in efficiency and effectiveness.
Findings
Reduces task success by 20.6% on LIBERO benchmark.
Increases action sequence length by 55%.
Raises constraint violations by 33%.
Abstract
Vision-language-action (VLA) models enable robots to follow natural-language instructions grounded in visual observations, but the instruction channel also introduces a critical vulnerability: small textual perturbations can alter downstream robot behavior. Systematic robustness evaluation therefore requires a black-box attacker that can generate minimal yet effective instruction edits across diverse VLA models. To this end, we present SABER, an agent-centric approach for automatically generating instruction-based adversarial attacks on VLA models under bounded edit budgets. SABER uses a GRPO-trained ReAct attacker to generate small, plausible adversarial instruction edits using character-, token-, and prompt-level tools under a bounded edit budget that induces targeted behavioral degradation, including task failure, unnecessarily long execution, and increased constraint violations. On…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
