Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Paria Rashidinejad, Hanlin Zhu, Kunhe Yang, Stuart Russell, Jiantao, Jiao

TL;DR
This paper introduces a practical, statistically optimal offline RL algorithm using augmented Lagrangian methods within the marginalized importance sampling framework, effectively handling general function approximation without conservative regularization.
Contribution
It presents the first offline RL algorithm that is both statistically optimal and practical under general function approximation, bypassing the need for uncertainty quantification and conservative regularization.
Findings
Achieves statistical optimality in offline RL with general function approximation.
Uses augmented Lagrangian to enforce occupancy constraints effectively.
Eliminates the need for conservative regularization, improving practicality.
Abstract
Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years. Much effort has focused on improving offline RL practicality by addressing the prevalent issue of partial data coverage through various forms of conservative policy learning. While the majority of algorithms do not have finite-sample guarantees, several provable conservative offline RL algorithms are designed and analyzed within the single-policy concentrability framework that handles partial coverage. Yet, in the nonlinear function approximation setting where confidence intervals are difficult to obtain, existing provable algorithms suffer from computational intractability, prohibitively strong assumptions, and suboptimal statistical rates. In this paper, we leverage the marginalized importance sampling (MIS)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDistributed Sensor Networks and Detection Algorithms · Age of Information Optimization · Probability and Risk Models
