Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement   Learning?

Qiwen Cui; Lin F. Yang

arXiv:2010.05673·cs.LG·October 20, 2020·6 cites

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?

Qiwen Cui, Lin F. Yang

PDF

Open Access 1 Video

TL;DR

This paper analyzes the sample complexity of model-based reinforcement learning with feature representations, proving that a plug-in solver approach can be sample-efficient under certain conditions, with complexity depending only on feature dimension.

Contribution

It establishes the minimax sample complexity bounds for feature-based RL using a plug-in solver, including cases with and without anchor-states.

Findings

01

Sample complexity is $O(K/(1-)^3 ^2)$ under the anchor-state assumption.

02

The approach is effective even without anchor-states, showing flexibility.

03

Complexity depends only on feature dimension, not on state or action space.

Abstract

It is believed that a model-based approach for reinforcement learning (RL) is the key to reduce sample complexity. However, the understanding of the sample optimality of model-based RL is still largely missing, even for the linear case. This work considers sample complexity of finding an $ϵ$ -optimal policy in a Markov decision process (MDP) that admits a linear additive feature representation, given only access to a generative model. We solve this problem via a plug-in solver approach, which builds an empirical model and plans in this empirical model via an arbitrary plug-in solver. We prove that under the anchor-state assumption, which implies implicit non-negativity in the feature space, the minimax sample complexity of finding an $ϵ$ -optimal policy in a $γ$ -discounted MDP is $O (K / (1 - γ)^{3} ϵ^{2})$ , which only depends on the dimensionality $K$ of the feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning?· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Advanced Multi-Objective Optimization Algorithms