Symmetric Behavior Regularized Policy Optimization

Lingwei Zhu; Haseeb Shah; Zheng Chen; Yukie Nagai; Martha White

arXiv:2508.04225·cs.LG·December 2, 2025

Symmetric Behavior Regularized Policy Optimization

Lingwei Zhu, Haseeb Shah, Zheng Chen, Yukie Nagai, Martha White

PDF

Open Access

TL;DR

This paper explores symmetric regularization in offline reinforcement learning, introduces a novel algorithm S$f$-AC based on Taylor expansion, and demonstrates its effectiveness across multiple tasks while avoiding common failures.

Contribution

It is the first to analyze symmetric regularization in policy optimization, proposing a Taylor series approximation and a new algorithm for stable, effective offline RL.

Findings

01

S$f$-AC outperforms existing methods on D4RL MuJoCo tasks.

02

Symmetric regularization can be effectively approximated using Taylor series.

03

S$f$-AC avoids failures seen in other algorithms like IQL, SQL, XQL, and AWAC.

Abstract

Behavior Regularized Policy Optimization (BRPO) leverages asymmetric (divergence) regularization to mitigate the distribution shift in offline Reinforcement Learning. This paper is the first to study the open question of symmetric regularization. We show that symmetric regularization does not permit an analytic optimal policy $π^{*}$ , posing a challenge to practical utility of symmetric BRPO. We approximate $π^{*}$ by the Taylor series of Pearson-Vajda $χ^{n}$ divergences and show that an analytic policy expression exists only when the series is capped at $n = 5$ . To compute the solution in a numerically stable manner, we propose to Taylor expand the conditional symmetry term of the symmetric divergence loss, leading to a novel algorithm: Symmetric $f$ -Actor Critic (S $f$ -AC). S $f$ -AC achieves consistently strong results across various D4RL MuJoCo tasks. Additionally, S $f$ -AC avoids…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning