On The Statistical Complexity of Offline Decision-Making
Thanh Nguyen-Tang, Raman Arora

TL;DR
This paper investigates the fundamental limits of offline decision-making with function approximation, providing near-optimal rates for stochastic contextual bandits and MDPs, and introducing a new data coverage measure.
Contribution
It establishes minimax-optimal rates based on pseudo-dimension and introduces a novel characterization of behavior policy that generalizes previous data coverage notions.
Findings
Derived near-minimax rates for offline decision-making tasks.
Introduced a new measure of data coverage that subsumes previous notions.
Showed benefits of offline data in online decision-making regimes.
Abstract
We study the statistical complexity of offline decision-making with function approximation, establishing (near) minimax-optimal rates for stochastic contextual bandits and Markov decision processes. The performance limits are captured by the pseudo-dimension of the (value) function class and a new characterization of the behavior policy that \emph{strictly} subsumes all the previous notions of data coverage in the offline decision-making literature. In addition, we seek to understand the benefits of using offline data in online decision-making and show nearly minimax-optimal rates in a wide range of regimes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
