TL;DR
Tex3D introduces a novel framework for creating physically realistic 3D adversarial textures that significantly impair vision-language-action models in robotic manipulation, revealing critical vulnerabilities.
Contribution
The paper presents Tex3D, the first end-to-end method for optimizing 3D adversarial textures within VLA simulation environments, incorporating FBD and TAAO techniques.
Findings
Achieves up to 96.7% task failure rate in experiments.
Demonstrates significant degradation of VLA performance due to 3D adversarial textures.
Reveals vulnerabilities of VLA systems to physically grounded attacks.
Abstract
Vision-language-action (VLA) models have shown strong performance in robotic manipulation, yet their robustness to physically realizable adversarial attacks remains underexplored. Existing studies reveal vulnerabilities through language perturbations and 2D visual attacks, but these attack surfaces are either less representative of real deployment or limited in physical realism. In contrast, adversarial 3D textures pose a more physically plausible and damaging threat, as they are naturally attached to manipulated objects and are easier to deploy in physical environments. Bringing adversarial 3D textures to VLA systems is nevertheless nontrivial. A central obstacle is that standard 3D simulators do not provide a differentiable optimization path from the VLA objective function back to object appearance, making it difficult to optimize through an end-to-end manner. To address this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
