Diffusion Guided Adversarial State Perturbations in Reinforcement Learning
Xiaolin Sun, Feidi Liu, Zhengming Ding, ZiZhan Zheng

TL;DR
This paper introduces SHIFT, a diffusion-based attack method that creates semantically meaningful adversarial state perturbations in reinforcement learning, exposing vulnerabilities of current defenses and emphasizing the need for more robust policies.
Contribution
The paper presents SHIFT, a novel diffusion-guided attack that surpasses traditional norm-based attacks by generating realistic, semantics-altering adversarial states in RL environments.
Findings
SHIFT effectively breaks existing defenses against RL attacks.
The attack produces more perceptually stealthy adversarial states.
Current defenses are vulnerable to semantics-aware perturbations.
Abstract
Reinforcement learning (RL) systems, while achieving remarkable success across various domains, are vulnerable to adversarial attacks. This is especially a concern in vision-based environments where minor manipulations of high-dimensional image inputs can easily mislead the agent's behavior. To this end, various defenses have been proposed recently, with state-of-the-art approaches achieving robust performance even under large state perturbations. However, after closer investigation, we found that the effectiveness of the current defenses is due to a fundamental weakness of the existing norm-constrained attacks, which can barely alter the semantics of image input even under a relatively large perturbation budget. In this work, we propose SHIFT, a novel policy-agnostic diffusion-based state perturbation attack to go beyond this limitation. Our attack is able to generate perturbed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
