Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning   with Linear Function Approximation

Dan Qiao; Yu-Xiang Wang

arXiv:2210.00701·cs.LG·February 23, 2023

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

Dan Qiao, Yu-Xiang Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a near-optimal algorithm for reward-free reinforcement learning with linear function approximation, achieving optimal deployment and sample complexities simultaneously, which is crucial for cost-sensitive real-world applications.

Contribution

It presents the first algorithm with optimal deployment efficiency and linear dependence in sample complexity for reward-free RL under linear MDPs.

Findings

01

Achieves $ ilde{O}(d^2H^5/\epsilon^2)$ trajectory complexity for $\\epsilon$-optimal policies.

02

Introduces exploration-preserving policy discretization and a generalized G-optimal experiment design.

03

Provides lower bounds for switching cost and batch complexity in low-adaptive RL.

Abstract

We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the \emph{reward-free} exploration setting. This is a well-motivated problem because deploying new policies is costly in real-life RL applications. Under the linear MDP setting with feature dimension $d$ and planning horizon $H$ , we propose a new algorithm that collects at most $O (\frac{d ^{2} H ^{5}}{ϵ ^{2}})$ trajectories within $H$ deployments to identify $ϵ$ -optimal policy for any (possibly data-dependent) choice of reward functions. To the best of our knowledge, our approach is the first to achieve optimal deployment complexity and optimal $d$ dependence in sample complexity at the same time, even if the reward is known ahead of time. Our novel techniques include an exploration-preserving policy discretization and a generalized G-optimal experiment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation· slideslive

Taxonomy

TopicsAge of Information Optimization · Reinforcement Learning in Robotics · Energy Harvesting in Wireless Networks