Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs   with a Generative Model

Bingyan Wang; Yuling Yan; Jianqing Fan

arXiv:2105.14016·cs.LG·October 28, 2022·6 cites

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model

Bingyan Wang, Yuling Yan, Jianqing Fan

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that for large-scale MDPs with linear features, model-based RL and Q-learning can learn near-optimal policies efficiently, with sample complexities depending on feature dimension rather than state-action space size.

Contribution

The paper provides tight sample complexity bounds for model-based RL and Q-learning in linearly-parameterized MDPs, matching minimax lower bounds and showing efficiency when feature dimension is small.

Findings

01

Sample complexity scales with feature dimension K, not state-action space size.

02

Model-based approach matches minimax lower bounds in sample efficiency.

03

Q-learning also achieves near-optimal sample complexity under the linear feature assumption.

Abstract

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $S$ and the action space $A$ are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with $∣ S ∣ \times ∣ A ∣$ , which can be prohibitively large when $S$ or $A$ is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.Q-learning) provably learns an $ε$ -optimal policy (resp.Q-function) with high probability as soon as the sample size exceeds the order of $\frac{K}{( 1 - γ ) ^{3} ε ^{2}}$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Formal Methods in Verification

MethodsQ-Learning