Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
Tianhao Wang, Dongruo Zhou, Quanquan Gu

TL;DR
This paper introduces two efficient reinforcement learning algorithms for linear Markov decision processes under limited adaptivity constraints, achieving near-optimal regret bounds with fewer policy switches or batches.
Contribution
The paper proposes novel RL algorithms for linear function approximation under batch and rare policy switch models, reducing adaptivity while maintaining optimal regret bounds.
Findings
Achieves $ ilde O( ext{regret})$ bounds similar to fully adaptive algorithms.
Shows only $ ext{sqrt}(T/dH)$ batches needed for near-optimal regret.
Establishes a tight lower bound for batch learning regret dependency on B.
Abstract
We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: the batch learning model and the rare policy switch model, and propose two efficient online RL algorithms for episodic linear Markov decision processes, where the transition probability and the reward function can be represented as a linear function of some known feature mapping. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an regret, where is the dimension of the feature mapping, is the episode length, is the number of interactions and is the number of batches. Our result suggests that it suffices to use only batches to obtain regret. For the rare policy switch model, our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
