Robust Offline Reinforcement Learning with Linearly Structured f-Divergence Regularization
Cheng Tang, Zhishuai Liu, Pan Xu

TL;DR
This paper introduces a new framework for offline robust reinforcement learning that incorporates structured regularization based on $f$-divergence, leading to more effective and computationally efficient policies under dynamics shifts.
Contribution
The paper proposes the $d$-rectangular linear RRMDP framework and the R2PVI algorithm, integrating latent structures and linear approximation for robust policy learning with theoretical guarantees.
Findings
R2PVI achieves near-optimal suboptimality bounds.
Numerical experiments show R2PVI learns robust policies effectively.
R2PVI outperforms baseline methods in computational efficiency.
Abstract
The Robust Regularized Markov Decision Process (RRMDP) is proposed to learn policies robust to dynamics shifts by adding regularization to the transition dynamics in the value function. Existing methods mostly use unstructured regularization, potentially leading to conservative policies under unrealistic transitions. To address this limitation, we propose a novel framework, the -rectangular linear RRMDP (-RRMDP), which introduces latent structures into both transition kernels and regularization. We focus on offline reinforcement learning, where an agent learns policies from a precollected dataset in the nominal environment. We develop the Robust Regularized Pessimistic Value Iteration (R2PVI) algorithm that employs linear function approximation for robust policy learning in -RRMDPs with -divergence based regularization terms on transition kernels. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdaptive Dynamic Programming Control · Distributed Sensor Networks and Detection Algorithms · Distributed Control Multi-Agent Systems
MethodsSparse Evolutionary Training
