Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization
Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao

TL;DR
This paper introduces a new framework called OA-PI that improves the robustness of reinforcement learning policies against action perturbations by considering optimal adversaries, compatible with popular DRL algorithms, and validated through experiments.
Contribution
The paper proposes the OA-PI framework that enhances action robustness in RL policies by integrating optimal adversary evaluation into policy iteration, compatible with TD3 and PPO.
Findings
Enhanced robustness of DRL policies against action adversaries.
Maintained nominal performance and sample efficiency.
Effective across various environments.
Abstract
Reinforcement Learning (RL) has achieved remarkable success in sequential decision tasks. However, recent studies have revealed the vulnerability of RL policies to different perturbations, raising concerns about their effectiveness and safety in real-world applications. In this work, we focus on the robustness of RL policies against action perturbations and introduce a novel framework called Optimal Adversary-aware Policy Iteration (OA-PI). Our framework enhances action robustness under various perturbations by evaluating and improving policy performance against the corresponding optimal adversaries. Besides, our approach can be integrated into mainstream DRL algorithms such as Twin Delayed DDPG (TD3) and Proximal Policy Optimization (PPO), improving action robustness effectively while maintaining nominal performance and sample efficiency. Experimental results across various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
