Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation
Ke Zhao (1), Huayang Huang (1), Miao Li (1), Yu Wu (1) ((1) Wuhan, University)

TL;DR
This paper introduces a novel adversarial prompt attack for language-conditioned robotic models, using continuous action optimization and intermediate feature analysis to improve attack success and transferability across tasks and models.
Contribution
It proposes a new adversarial attack method leveraging continuous action representations and intermediate feature gradients, addressing robustness issues in robotic language models.
Findings
Our attack outperforms existing methods in 13 manipulation tasks.
Adversarial prefixes effectively induce unintended robot actions.
Method demonstrates strong transferability across model variants.
Abstract
Language-conditioned robotic learning has significantly enhanced robot adaptability by enabling a single model to execute diverse tasks in response to verbal commands. Despite these advancements, security vulnerabilities within this domain remain largely unexplored. This paper addresses this gap by proposing a novel adversarial prompt attack tailored to language-conditioned robotic models. Our approach involves crafting a universal adversarial prefix that induces the model to perform unintended actions when added to any original prompt. We demonstrate that existing adversarial techniques exhibit limited effectiveness when directly transferred to the robotic domain due to the inherent robustness of discretized robotic action spaces. To overcome this challenge, we propose to optimize adversarial prefixes based on continuous action representations, circumventing the discretization process.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Anomaly Detection Techniques and Applications
