Investigating Vulnerabilities of Deep Neural Policies
Ezgi Korkmaz

TL;DR
This paper investigates how adversarial training affects deep neural policies in reinforcement learning, revealing increased sensitivity to low-frequency perturbations and differences in feature sensitivities, contributing to understanding robustness.
Contribution
It introduces a novel analysis of adversarial training effects on neural policies through Fourier spectrum and feature sensitivity comparisons.
Findings
Adversarially trained policies are more sensitive to low-frequency perturbations.
Fourier analysis shows a focus on lower frequencies in adversarially trained policies.
Feature sensitivity differences highlight robustness variations between training methods.
Abstract
Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
