Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization

Cheng Tang; Zhishuai Liu; Pan Xu

arXiv:2411.18612·cs.LG·November 3, 2025

Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization

Cheng Tang, Zhishuai Liu, Pan Xu

PDF

Open Access 1 Video

TL;DR

This paper introduces a new framework for offline robust reinforcement learning that incorporates structured regularization based on $f$-divergence, leading to more effective and computationally efficient policies under dynamics shifts.

Contribution

The paper proposes the $d$-rectangular linear RRMDP framework and the R2PVI algorithm, integrating latent structures and linear approximation for robust policy learning with theoretical guarantees.

Findings

01

R2PVI achieves near-optimal suboptimality bounds.

02

Numerical experiments show R2PVI learns robust policies effectively.

03

R2PVI outperforms baseline methods in computational efficiency.

Abstract

The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured regularization, potentially leading to conservative policies under unrealistic transitions. To address this limitation, we propose a novel framework, the $d$ -rectangular linear RRMDP ( $d$ -RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employs linear function approximation for robust policy learning in $d$ -RRMDPs with $f$ -divergence based regularization terms on transition kernels. We provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robust Offline Reinforcement Learning with Linearly Structured $f$-Divergence Regularization· slideslive

Taxonomy

TopicsAdaptive Dynamic Programming Control · Distributed Sensor Networks and Detection Algorithms · Distributed Control Multi-Agent Systems

MethodsSparse Evolutionary Training