Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model
Bingyan Wang, Yuling Yan, Jianqing Fan

TL;DR
This paper demonstrates that for large-scale MDPs with linear features, model-based RL and Q-learning can learn near-optimal policies efficiently, with sample complexities depending on feature dimension rather than state-action space size.
Contribution
The paper provides tight sample complexity bounds for model-based RL and Q-learning in linearly-parameterized MDPs, matching minimax lower bounds and showing efficiency when feature dimension is small.
Findings
Sample complexity scales with feature dimension K, not state-action space size.
Model-based approach matches minimax lower bounds in sample efficiency.
Q-learning also achieves near-optimal sample complexity under the linear feature assumption.
Abstract
The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space and the action space are both finite, to obtain a nearly optimal policy with sampling access to a generative model, the minimax optimal sample complexity scales linearly with , which can be prohibitively large when or is large. This paper considers a Markov decision process (MDP) that admits a set of state-action features, which can linearly express (or approximate) its probability transition kernel. We show that a model-based approach (resp.Q-learning) provably learns an -optimal policy (resp.Q-function) with high probability as soon as the sample size exceeds the order of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Formal Methods in Verification
MethodsQ-Learning
