Adversarial Attacks on Neural Network Policies
Sandy Huang, Nicolas Papernot, Ian Goodfellow, Yan Duan, Pieter Abbeel

TL;DR
This paper demonstrates that neural network policies in reinforcement learning are vulnerable to adversarial attacks, which can significantly impair performance with minimal input perturbations, across various tasks and training methods.
Contribution
It extends adversarial attack analysis from computer vision to reinforcement learning policies, highlighting their vulnerability and characterizing attack effectiveness in different settings.
Findings
Adversarial attacks cause significant performance drops in RL policies.
Small, imperceptible input perturbations can deceive policies.
Vulnerability persists across different tasks and training algorithms.
Abstract
Machine learning classifiers are known to be vulnerable to inputs maliciously constructed by adversaries to force misclassification. Such adversarial examples have been extensively studied in the context of computer vision applications. In this work, we show adversarial attacks are also effective when targeting neural network policies in reinforcement learning. Specifically, we show existing adversarial example crafting techniques can be used to significantly degrade test-time performance of trained policies. Our threat model considers adversaries capable of introducing small perturbations to the raw input of the policy. We characterize the degree of vulnerability across tasks and training algorithms, for a subclass of adversarial-example attacks in white-box and black-box settings. Regardless of the learned task or training algorithm, we observe a significant drop in performance, even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
