Provably Efficient Reinforcement Learning with Linear Function   Approximation Under Adaptivity Constraints

Tianhao Wang; Dongruo Zhou; Quanquan Gu

arXiv:2101.02195·cs.LG·January 4, 2022·5 cites

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

Tianhao Wang, Dongruo Zhou, Quanquan Gu

PDF

Open Access 1 Video

TL;DR

This paper introduces two efficient reinforcement learning algorithms for linear Markov decision processes under limited adaptivity constraints, achieving near-optimal regret bounds with fewer policy switches or batches.

Contribution

The paper proposes novel RL algorithms for linear function approximation under batch and rare policy switch models, reducing adaptivity while maintaining optimal regret bounds.

Findings

01

Achieves $ ilde O( ext{regret})$ bounds similar to fully adaptive algorithms.

02

Shows only $ ext{sqrt}(T/dH)$ batches needed for near-optimal regret.

03

Establishes a tight lower bound for batch learning regret dependency on B.

Abstract

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde{O} (d^{3} H^{3} T + d H T / B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches. Our result suggests that it suffices to use only $T / d H$ batches to obtain $\tilde{O} (d^{3} H^{3} T)$ regret. For the rare policy switch model, our proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization