On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Haoran Li, Jiayu Lv, Congying Han, Zicheng Zhang, Anqi Li, Yan Liu, Tiande Guo, Nan Jiang

TL;DR
This paper explores the practical conflict between optimality and robustness in deep reinforcement learning, revealing how adversarial training reshapes the landscape and proposing a bilevel framework to balance these goals effectively.
Contribution
It introduces BARPO, a bilevel policy optimization framework that modulates adversary strength to reconcile the theoretical alignment of optimality and robustness in practice.
Findings
BARPO outperforms standard adversarially robust policy optimization.
Adversary strength modulation improves navigability and robustness.
Tradeoff between robustness and optimality is linked to landscape reshaping by adversaries.
Abstract
Achieving optimality and adversarial robustness in deep reinforcement learning has long been regarded as conflicting goals. Nonetheless, recent theoretical insights presented in CAR suggest a potential alignment, raising the important question of how to realize this in practice. This paper first identifies a key gap between theory and practice by comparing standard policy optimization (SPO) and adversarially robust policy optimization (ARPO). Although they share theoretical consistency, a fundamental tension between robustness and optimality arises in practical policy gradient methods. SPO tends toward convergence to vulnerable first-order stationary policies (FOSPs) with strong natural performance, whereas ARPO typically favors more robust FOSPs at the expense of reduced returns. Furthermore, we attribute this tradeoff to the reshaping effect of the strongest adversary in ARPO, which…
Peer Reviews
Decision·ICLR 2026 Poster
* The paper was well written and easy to follow. Assumptions and motivations were clearly justified. * The insights on how the 'valleys' were formed and how to 'bridge' them were novel and interesting. * Theoretical analyses were sound. * In experiments BARPO was evaluated against extensive attack/robustness conditions.
* BARPO employs a KL-based surrogate for the inner minimization to approximate the strongest adversary. * While the paper shows that minimizing this surrogate aligns with minimizing the true adversarial value, the reviewer wonders whether the authors have additional theoretical or empirical insights regarding the convergence rate under this surrogate formulation. In particular, are there quantifiable bounds or guarantees on the degree of robustness potentially lost due to the surrogate approx
- To the best of reviewer's knowledge, the convergence analysis for the max-min optimization of ARPO is new. - The geometry illustration (Figure 2) in Section 3.2 is informative and helpful. - The experiments are comprehensive. - The proposed bi-level optimization problem is intuitive (although the motivation and path reaching there are highly confusing).
- There is no interpretation or explanation for Proposition 3.1. $V^-$ is undefined. I am not sure why this result is useful. It only states that the gap between the nominal return and the return under attack can be much larger for ARPO than SPO polices. In fact, the whole subsection 3.1.2 feels disconnected from the rest of the work. - I expect the author to draw insights from their theoretical analysis in 3.1.1 but they only provide empirical evidence. In addition, the empirical evidence seem
+ The theoretical development is clear, and the paper is well written. + The paper explains the FOSPs in SPO and ARPO and motivates a unified and clear way. + On MuJoCo benchmarks, BAR-PPO shows strong natural and robust performance and adding SPO guidance improves clean returns.
--
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Reinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques
