Adversarial Attacks on Neural Network Policies

Sandy Huang; Nicolas Papernot; Ian Goodfellow; Yan Duan; Pieter Abbeel

arXiv:1702.02284·cs.LG·February 9, 2017·68 cites

Adversarial Attacks on Neural Network Policies

Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that neural network policies in reinforcement learning are vulnerable to adversarial attacks, which can significantly impair performance with minimal input perturbations, across various tasks and training methods.

Contribution

It extends adversarial attack analysis from computer vision to reinforcement learning policies, highlighting their vulnerability and characterizing attack effectiveness in different settings.

Findings

01

Adversarial attacks cause significant performance drops in RL policies.

02

Small, imperceptible input perturbations can deceive policies.

03

Vulnerability persists across different tasks and training algorithms.

Abstract

Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification. Such adversarial examples have been extensively studied in the context of computer vision applications. In this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Our threat model considers adversaries capable of introducing small perturbations to the raw input of the policy. We characterize the degree of vulnerability across tasks and training algorithms, for a subclass of adversarial-example attacks in white-box and black-box settings. Regardless of the learned task or training algorithm, we observe a significant drop in performance, even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ssg-research/ad3-action-distribution-divergence-detector
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications