On Instance-Dependent Bounds for Offline Reinforcement Learning with   Linear Function Approximation

Thanh Nguyen-Tang; Ming Yin; Sunil Gupta; Svetha Venkatesh; Raman; Arora

arXiv:2211.13208·cs.LG·January 30, 2023

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

Thanh Nguyen-Tang, Ming Yin, Sunil Gupta, Svetha Venkatesh, Raman, Arora

PDF

Open Access 1 Video

TL;DR

This paper introduces an algorithm for offline reinforcement learning with linear function approximation that achieves faster, instance-dependent convergence rates and can attain zero sub-optimality under certain conditions, improving over prior bounds.

Contribution

The work presents the first $ ilde{ ext{O}}(1/K)$ instance-dependent bound and zero sub-optimality guarantee for offline RL with linear function approximation from adaptively collected data.

Findings

01

Achieves $ ilde{ ext{O}}(1/K)$ convergence rate under partial coverage.

02

Attains zero sub-optimality error beyond a finite threshold.

03

Provides matching lower bounds for offline RL with linear approximation.

Abstract

Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{O} (\frac{1}{K})$ , with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of \emph{concentrability} with respect to an optimal policy, the proposed algorithm yields a fast rate of $\tilde{O} (\frac{1}{K})$ for offline RL when there is a positive gap in the optimal Q-value functions, even when the offline data were adaptively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Advanced Bandit Algorithms Research