Delving into adversarial attacks on deep policies

Jernej Kos; Dawn Song

arXiv:1705.06452·stat.ML·May 19, 2017·121 cites

Delving into adversarial attacks on deep policies

Jernej Kos, Dawn Song

PDF

Open Access

TL;DR

This paper investigates adversarial attacks on deep reinforcement learning policies, comparing attack methods, proposing a new attack reduction technique, and examining the impact of retraining on robustness.

Contribution

It introduces a novel attack reduction method based on the value function and analyzes the effects of retraining on model resilience.

Findings

01

Adversarial examples are more effective than random noise in attacking policies.

02

The proposed method reduces the number of adversarial injections needed for success.

03

Retraining on noise and FGSM perturbations influences policy robustness.

Abstract

Adversarial examples have been shown to exist for a variety of deep learning architectures. Deep reinforcement learning has shown promising results on training agent policies directly on raw inputs such as image pixels. In this paper we present a novel study into adversarial attacks on deep reinforcement learning polices. We compare the effectiveness of the attacks using adversarial examples vs. random noise. We present a novel method for reducing the number of times adversarial examples need to be injected for a successful attack, based on the value function. We further explore how re-training on random noise and FGSM perturbations affects the resilience against adversarial examples.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics