On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang; Raman Arora

arXiv:2501.06339·cs.LG·January 14, 2025

On The Statistical Complexity of Offline Decision-Making

Thanh Nguyen-Tang, Raman Arora

PDF

TL;DR

This paper investigates the fundamental limits of offline decision-making with function approximation, providing near-optimal rates for stochastic contextual bandits and MDPs, and introducing a new data coverage measure.

Contribution

It establishes minimax-optimal rates based on pseudo-dimension and introduces a novel characterization of behavior policy that generalizes previous data coverage notions.

Findings

01

Derived near-minimax rates for offline decision-making tasks.

02

Introduced a new measure of data coverage that subsumes previous notions.

03

Showed benefits of offline data in online decision-making regimes.

Abstract

We study the statistical complexity of offline decision-making with function approximation, establishing (near) minimax-optimal rates for stochastic contextual bandits and Markov decision processes. The performance limits are captured by the pseudo-dimension of the (value) function class and a new characterization of the behavior policy that \emph{strictly} subsumes all the previous notions of data coverage in the offline decision-making literature. In addition, we seek to understand the benefits of using offline data in online decision-making and show nearly minimax-optimal rates in a wide range of regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.