Delving into adversarial attacks on deep policies
Jernej Kos, Dawn Song

TL;DR
This paper investigates adversarial attacks on deep reinforcement learning policies, comparing attack methods, proposing a new attack reduction technique, and examining the impact of retraining on robustness.
Contribution
It introduces a novel attack reduction method based on the value function and analyzes the effects of retraining on model resilience.
Findings
Adversarial examples are more effective than random noise in attacking policies.
The proposed method reduces the number of adversarial injections needed for success.
Retraining on noise and FGSM perturbations influences policy robustness.
Abstract
Adversarial examples have been shown to exist for a variety of deep learning architectures. Deep reinforcement learning has shown promising results on training agent policies directly on raw inputs such as image pixels. In this paper we present a novel study into adversarial attacks on deep reinforcement learning polices. We compare the effectiveness of the attacks using adversarial examples vs. random noise. We present a novel method for reducing the number of times adversarial examples need to be injected for a successful attack, based on the value function. We further explore how re-training on random noise and FGSM perturbations affects the resilience against adversarial examples.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics
