What are the Statistical Limits of Offline RL with Linear Function Approximation?
Ruosong Wang, Dean P. Foster, Sham M. Kakade

TL;DR
This paper investigates the fundamental limits of offline reinforcement learning with linear function approximation, revealing that even under ideal conditions, exponential sample complexity is necessary unless stronger assumptions are met.
Contribution
It establishes necessary conditions for sample-efficient offline RL, showing that realizability and good coverage alone are insufficient for efficient policy evaluation.
Findings
Sample complexity is exponential in horizon under realizability and coverage assumptions.
Stronger conditions like low distribution shift are required for efficiency.
Highlights fundamental limitations of current offline RL approaches.
Abstract
Offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of (causal) sequential decision making strategies. The hope is that offline reinforcement learning coupled with function approximation methods (to deal with the curse of dimensionality) can provide a means to help alleviate the excessive sample complexity burden in modern sequential decision making problems. However, the extent to which this broader approach can be effective is not well understood, where the literature largely consists of sufficient conditions. This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning. Perhaps surprisingly, our main result shows that even if: i) we have realizability in that the true value function of \emph{every} policy is linear in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Advanced Bandit Algorithms Research
