On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation
Thanh Nguyen-Tang, Ming Yin, Sunil Gupta, Svetha Venkatesh, Raman, Arora

TL;DR
This paper introduces an algorithm for offline reinforcement learning with linear function approximation that achieves faster, instance-dependent convergence rates and can attain zero sub-optimality under certain conditions, improving over prior bounds.
Contribution
The work presents the first $ ilde{ ext{O}}(1/K)$ instance-dependent bound and zero sub-optimality guarantee for offline RL with linear function approximation from adaptively collected data.
Findings
Achieves $ ilde{ ext{O}}(1/K)$ convergence rate under partial coverage.
Attains zero sub-optimality error beyond a finite threshold.
Provides matching lower bounds for offline RL with linear approximation.
Abstract
Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of , with being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of \emph{concentrability} with respect to an optimal policy, the proposed algorithm yields a fast rate of for offline RL when there is a positive gap in the optimal Q-value functions, even when the offline data were adaptively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Advanced Bandit Algorithms Research
