Provably Efficient Reinforcement Learning for Discounted MDPs with   Feature Mapping

Dongruo Zhou; Jiafan He; Quanquan Gu

arXiv:2006.13165·cs.LG·February 24, 2021·30 cites

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Dongruo Zhou, Jiafan He, Quanquan Gu

PDF

Open Access 1 Video

TL;DR

This paper introduces a new reinforcement learning algorithm for discounted MDPs with feature mappings, achieving near-optimal regret bounds without requiring a generative model or ergodicity assumptions.

Contribution

The paper presents the first polynomial regret bound for feature-based RL in discounted MDPs without strong assumptions, and establishes near-matching lower bounds.

Findings

01

Achieves regret of $ ilde O(drac{ oot T}{(1-gamma)^2})$

02

Provides a lower bound of $oldsymbol{ ilde Omega(drac{ oot T}{(1-gamma)^{1.5}})}$

03

Demonstrates near-optimality of the proposed algorithm

Abstract

Modern tasks in reinforcement learning have large state and action spaces. To deal with them efficiently, one often uses predefined feature mapping to represent states and actions in a low-dimensional space. In this paper, we study reinforcement learning for discounted Markov Decision Processes (MDPs), where the transition kernel can be parameterized as a linear function of certain feature mapping. We propose a novel algorithm that makes use of the feature mapping and obtains a $\tilde{O} (d T / (1 - γ)^{2})$ regret, where $d$ is the dimension of the feature space, $T$ is the time horizon and $γ$ is the discount factor of the MDP. To the best of our knowledge, this is the first polynomial regret bound without accessing the generative model or making strong assumptions such as ergodicity of the MDP. By constructing a special class of MDPs, we also show that for any algorithms,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms