Trajectory Data Suffices for Statistically Efficient Learning in Offline   RL with Linear $q^\pi$-Realizability and Concentrability

Volodymyr Tkachuk; Gell\'ert Weisz; Csaba Szepesv\'ari

arXiv:2405.16809·cs.LG·May 28, 2024

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability

Volodymyr Tkachuk, Gell\'ert Weisz, Csaba Szepesv\'ari

PDF

Open Access 1 Video

TL;DR

This paper proves that trajectory data enables efficient offline RL in linear $q^ p\pi$-realizable MDPs, overcoming previous negative results with transition data, and showing sample complexity depends polynomially on key parameters.

Contribution

It demonstrates that trajectory data suffices for statistically efficient offline RL in linear $q^ p\pi$-realizable MDPs, overcoming prior limitations with transition data.

Findings

01

Trajectory data allows polynomial-sample RL algorithms independent of state space size.

02

Linear MDP approximation facilitates estimation with trajectory data.

03

Negative results with transition data do not extend to trajectory data.

Abstract

We consider offline reinforcement learning (RL) in $H$ -horizon Markov decision processes (MDPs) under the linear $q^{π}$ -realizability assumption, where the action-value function of every policy is linear with respect to a given $d$ -dimensional feature function. The hope in this setting is that learning a good policy will be possible without requiring a sample size that scales with the number of states in the MDP. Foster et al. [2021] have shown this to be impossible even under $concentrability$ , a data coverage assumption where a coefficient $C_{conc}$ bounds the extent to which the state-action distribution of any policy can veer off the data distribution. However, the data in this previous work was in the form of a sequence of individual transitions. This leaves open the question of whether the negative result mentioned could be overcome if the data was composed of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear $q^\pi$-Realizability and Concentrability· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Speech and Audio Processing · Target Tracking and Data Fusion in Sensor Networks