Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization

Buqing Nie; Yangqing Fu; Jingtian Ji; Yue Gao

arXiv:2507.03372·cs.LG·July 8, 2025

Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization

Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao

PDF

TL;DR

This paper introduces a new framework called OA-PI that improves the robustness of reinforcement learning policies against action perturbations by considering optimal adversaries, compatible with popular DRL algorithms, and validated through experiments.

Contribution

The paper proposes the OA-PI framework that enhances action robustness in RL policies by integrating optimal adversary evaluation into policy iteration, compatible with TD3 and PPO.

Findings

01

Enhanced robustness of DRL policies against action adversaries.

02

Maintained nominal performance and sample efficiency.

03

Effective across various environments.

Abstract

Reinforcement Learning (RL) has achieved remarkable success in sequential decision tasks. However, recent studies have revealed the vulnerability of RL policies to different perturbations, raising concerns about their effectiveness and safety in real-world applications. In this work, we focus on the robustness of RL policies against action perturbations and introduce a novel framework called Optimal Adversary-aware Policy Iteration (OA-PI). Our framework enhances action robustness under various perturbations by evaluating and improving policy performance against the corresponding optimal adversaries. Besides, our approach can be integrated into mainstream DRL algorithms such as Twin Delayed DDPG (TD3) and Proximal Policy Optimization (PPO), improving action robustness effectively while maintaining nominal performance and sample efficiency. Experimental results across various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.