Detecting Adversarial Attacks on Neural Network Policies with Visual Foresight
Yen-Chen Lin, Ming-Yu Liu, Min Sun, Jia-Bin Huang

TL;DR
This paper introduces a defense mechanism for reinforcement learning agents that uses frame prediction to detect and mitigate adversarial attacks, enhancing safety in critical systems like autonomous vehicles.
Contribution
The paper proposes a novel detection method leveraging action-conditioned frame prediction to identify adversarial examples in neural network policies.
Findings
Effective detection of adversarial attacks in Atari games
Improved reward retention under attack scenarios
Outperforms baseline detection algorithms
Abstract
Deep reinforcement learning has shown promising results in learning control policies for complex sequential decision-making tasks. However, these neural network-based policies are known to be vulnerable to adversarial examples. This vulnerability poses a potentially serious threat to safety-critical systems such as autonomous vehicles. In this paper, we propose a defense mechanism to defend reinforcement learning agents from adversarial attacks by leveraging an action-conditioned frame prediction module. Our core idea is that the adversarial examples targeting at a neural network-based policy are not effective for the frame prediction model. By comparing the action distribution produced by a policy from processing the current observed frame to the action distribution produced by the same policy from processing the predicted frame from the action-conditioned frame prediction module, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
