ANO: A Principled Approach to Robust Policy Optimization

Yiheng Zhang; Yiming Wang; Kaiyan Zhao; Zhenglin Wan; Jiayu Chen; Leong Hou U

arXiv:2605.02320·cs.AI·May 7, 2026

ANO: A Principled Approach to Robust Policy Optimization

Yiheng Zhang, Yiming Wang, Kaiyan Zhao, Zhenglin Wan, Jiayu Chen, Leong Hou U

PDF

TL;DR

The paper introduces ANO, a new policy optimization method that replaces hard clipping with a robust, smooth mechanism, improving stability and performance in reinforcement learning and language model alignment.

Contribution

ANO is a novel policy optimization approach based on geometric principles, providing a robust alternative to existing methods like PPO and SPO.

Findings

01

ANO outperforms existing methods in MuJoCo and Atari control tasks.

02

ANO prevents policy collapse even at high learning rates.

03

In LLM alignment, ANO avoids catastrophic KL divergence explosions.

Abstract

Proximal Policy Optimization (PPO) dominates reinforcement learning and LLM alignment but relies on a "hard clipping" mechanism that discards valuable gradients. Conversely, unconstrained methods like SPO expose the optimization to unbounded updates, causing severe instability and policy collapse during extreme outlier encounters. To resolve this dilemma, we introduce a principled design space for policy optimization, demonstrating that a robust estimator must inherently suppress outliers while maintaining a smooth restoration force. Guided by these geometric principles, we derive Anchored Neighborhood Optimization (ANO), a novel method that seamlessly replaces hard clipping with a redescending gradient mechanism. Extensive evaluations demonstrate ANO's empirical superiority across diverse domains. In continuous (MuJoCo) and discrete (Atari) control, ANO establishes a robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.