Adversarial Robust Deep Reinforcement Learning Requires Redefining Robustness
Ezgi Korkmaz

TL;DR
This paper reveals that adversarial vulnerabilities in deep reinforcement learning are more widespread and can be mitigated by standard training methods, challenging existing notions of robustness.
Contribution
It demonstrates that high sensitivity directions are abundant in policy landscapes and vanilla training can outperform adversarial training in robustness.
Findings
High sensitivity directions are more common than previously thought.
Vanilla training can produce more robust policies than adversarial training.
Insights into the policy manifold can guide the development of more robust reinforcement learning methods.
Abstract
Learning from raw high dimensional data via interaction with a given environment has been effectively achieved through the utilization of deep neural networks. Yet the observed degradation in policy performance caused by imperceptible worst-case policy dependent translations along high sensitivity directions (i.e. adversarial perturbations) raises concerns on the robustness of deep reinforcement learning policies. In our paper, we show that these high sensitivity directions do not lie only along particular worst-case directions, but rather are more abundant in the deep neural policy landscape and can be found via more natural means in a black-box setting. Furthermore, we show that vanilla training techniques intriguingly result in learning more robust policies compared to the policies learnt via the state-of-the-art adversarial training techniques. We believe our work lays out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
