Regret-Based Defense in Adversarial Reinforcement Learning
Roman Belaire, Pradeep Varakantham, Thanh Nguyen, David Lo

TL;DR
This paper introduces a proactive regret-based method to enhance the robustness of Deep Reinforcement Learning policies against adversarial observation noise, outperforming reactive approaches in various benchmarks.
Contribution
It proposes a novel regret minimization framework for robust Deep RL, directly optimizing for worst-case performance in the presence of adversarial perturbations.
Findings
Significant performance improvements over existing methods.
Effective in a wide range of benchmark environments.
Proactive approach reduces vulnerability to unseen adversarial examples.
Abstract
Deep Reinforcement Learning (DRL) policies have been shown to be vulnerable to small adversarial noise in observations. Such adversarial noise can have disastrous consequences in safety-critical environments. For instance, a self-driving car receiving adversarially perturbed sensory observations about nearby signs (e.g., a stop sign physically altered to be perceived as a speed limit sign) or objects (e.g., cars altered to be recognized as trees) can be fatal. Existing approaches for making RL algorithms robust to an observation-perturbing adversary have focused on reactive approaches that iteratively improve against adversarial examples generated at each iteration. While such approaches have been shown to provide improvements over regular RL methods, they are reactive and can fare significantly worse if certain categories of adversarial examples are not generated during training. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
