Robust Regularized Policy Iteration under Transition Uncertainty

Hongqiang Lin; Zhenghui Fu; Weihao Tang; Pengfei Wang; Yiding Sun; Qixian Huang; Dongxu Zhang

arXiv:2603.09344·cs.AI·March 17, 2026

Robust Regularized Policy Iteration under Transition Uncertainty

Hongqiang Lin, Zhenghui Fu, Weihao Tang, Pengfei Wang, Yiding Sun, Qixian Huang, Dongxu Zhang

PDF

Open Access

TL;DR

This paper introduces RRPI, a robust policy iteration method for offline RL that explicitly accounts for transition uncertainty, improving performance and safety under distribution shifts.

Contribution

The paper proposes a novel robust regularized policy iteration framework that handles transition uncertainty with theoretical guarantees and practical efficiency.

Findings

01

RRPI outperforms recent baselines on D4RL benchmarks.

02

RRPI maintains robust performance by aligning low Q-values with high uncertainty.

03

The method guarantees monotonic improvement and convergence in robust policy optimization.

Abstract

Offline reinforcement learning (RL) enables data-efficient and safe policy learning without online exploration, but its performance often degrades under distribution shift. The learned policy may visit out-of-distribution state-action pairs where value estimates and learned dynamics are unreliable. To address policy-induced extrapolation and transition uncertainty in a unified framework, we formulate offline RL as robust policy optimization, treating the transition kernel as a decision variable within an uncertainty set and optimizing the policy against the worst-case dynamics. We propose Robust Regularized Policy Iteration (RRPI), which replaces the intractable max-min bilevel objective with a tractable KL-regularized surrogate and derives an efficient policy iteration procedure based on a robust regularized Bellman operator. We provide theoretical guarantees by showing that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks