Robust Deep Reinforcement Learning against Adversarial Perturbations on   State Observations

Huan Zhang; Hongge Chen; Chaowei Xiao; Bo Li; Mingyan Liu; Duane; Boning; Cho-Jui Hsieh

arXiv:2003.08938·cs.LG·July 15, 2021·38 cites

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations

Huan Zhang, Hongge Chen, Chaowei Xiao, Bo Li, Mingyan Liu, Duane, Boning, Cho-Jui Hsieh

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces a theoretically grounded method to enhance the robustness of deep reinforcement learning agents against adversarial perturbations in state observations, improving performance under attack and in some cases even without adversaries.

Contribution

The paper proposes the state-adversarial MDP framework and a novel policy regularization technique applicable to various DRL algorithms, significantly boosting robustness against adversarial attacks.

Findings

01

Robust DRL agents outperform baseline under strong adversarial attacks.

02

Robust policies also improve performance in non-adversarial environments.

03

The proposed method is applicable to multiple DRL algorithms including PPO, DDPG, and DQN.

Abstract

A deep reinforcement learning (DRL) agent observes its states through observations, which may contain natural measurement errors or adversarial noises. Since the observations deviate from the true states, they can mislead the agent into making suboptimal actions. Several works have shown this vulnerability via adversarial attacks, but existing approaches on improving the robustness of DRL under this setting have limited success and lack for theoretical principles. We show that naively applying existing techniques on improving robustness for classification tasks, like adversarial training, is ineffective for many RL tasks. We propose the state-adversarial Markov decision process (SA-MDP) to study the fundamental properties of this problem, and develop a theoretically principled policy regularization which can be applied to a large family of DRL algorithms, including proximal policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Robust Deep Reinforcement Learning against Adversarial Perturbations on State Observations· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Reinforcement Learning in Robotics

MethodsEntropy Regularization · Proximal Policy Optimization · Experience Replay · Weight Decay · *Communicated@Fast*How Do I Communicate to Expedia? · Q-Learning · Adam · Batch Normalization · Deep Deterministic Policy Gradient · Dense Connections