On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

Haolin Liu; Braham Snyder; Chen-Yu Wei

arXiv:2602.12107·cs.LG·February 13, 2026

On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage

Haolin Liu, Braham Snyder, Chen-Yu Wei

PDF

Open Access

TL;DR

This paper investigates the theoretical limits of offline reinforcement learning with $Q^ ext{*}$-approximation and partial coverage, introducing a new complexity framework and improving sample complexity bounds for certain algorithms.

Contribution

It introduces a general complexity framework inspired by decision-estimation coefficients, providing new insights and bounds for offline RL under partial coverage and $Q^ ext{*}$-realizability.

Findings

01

Established an information-theoretic lower bound for the setting.

02

Achieved an $oldsymbol{oldsymbol{ ext{ extonehalf}}}$-order sample complexity for soft Q-learning.

03

Provided the first characterization of offline learnability for low-Bellman-rank MDPs without Bellman completeness.

Abstract

We study offline reinforcement learning under $Q^{⋆}$ -approximation and partial coverage, a setting that motivates practical algorithms such as Conservative $Q$ -Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are $Q^{⋆}$ -realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given $Q^{⋆}$ function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research