Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage   Decomposition

Zihan Zhang; Yuan Zhou; Xiangyang Ji

arXiv:2004.10019·cs.LG·June 9, 2020·45 cites

Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

Zihan Zhang, Yuan Zhou, Xiangyang Ji

PDF

Open Access

TL;DR

This paper introduces a model-free reinforcement learning algorithm, UCB-Advantage, that achieves near-optimal regret bounds in finite-horizon MDPs, matching the performance of model-based methods and lower bounds.

Contribution

The paper presents UCB-Advantage, a novel model-free RL algorithm with improved regret bounds and applicability to concurrent learning, surpassing previous methods.

Findings

01

Achieves $ ilde{O}( oot{2}H^2SAT)$ regret bound

02

Matches the best known model-based algorithms and lower bounds

03

Has low local switching cost and supports concurrent RL

Abstract

We study the reinforcement learning problem in the setting of finite-horizon episodic Markov Decision Processes (MDPs) with $S$ states, $A$ actions, and episode length $H$ . We propose a model-free algorithm UCB-Advantage and prove that it achieves $\tilde{O} (H^{2} S A T)$ regret where $T = K H$ and $K$ is the number of episodes to play. Our regret bound improves upon the results of [Jin et al., 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-Advantage achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al., 2019].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning