On the Complexity of Offline Reinforcement Learning with $Q^\star$-Approximation and Partial Coverage
Haolin Liu, Braham Snyder, Chen-Yu Wei

TL;DR
This paper investigates the theoretical limits of offline reinforcement learning with $Q^ ext{*}$-approximation and partial coverage, introducing a new complexity framework and improving sample complexity bounds for certain algorithms.
Contribution
It introduces a general complexity framework inspired by decision-estimation coefficients, providing new insights and bounds for offline RL under partial coverage and $Q^ ext{*}$-realizability.
Findings
Established an information-theoretic lower bound for the setting.
Achieved an $oldsymbol{oldsymbol{ ext{ extonehalf}}}$-order sample complexity for soft Q-learning.
Provided the first characterization of offline learnability for low-Bellman-rank MDPs without Bellman completeness.
Abstract
We study offline reinforcement learning under -approximation and partial coverage, a setting that motivates practical algorithms such as Conservative -Learning (CQL; Kumar et al., 2020) but has received limited theoretical attention. Our work is inspired by the following open question: "Are -realizability and Bellman completeness sufficient for sample-efficient offline RL under partial coverage?" We answer in the negative by establishing an information-theoretic lower bound. Going substantially beyond this, we introduce a general framework that characterizes the intrinsic complexity of a given function class, inspired by model-free decision-estimation coefficients (DEC) for online RL (Foster et al., 2023b; Liu et al., 2025b). This complexity recovers and improves the quantities underlying the guarantees of Chen and Jiang (2022) and Uehara et al. (2023),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Advanced Bandit Algorithms Research
