Posterior sampling for reinforcement learning: worst-case regret bounds

Shipra Agrawal; Randy Jia

arXiv:1705.07041·cs.LG·April 1, 2020·2 cites

Posterior sampling for reinforcement learning: worst-case regret bounds

Shipra Agrawal, Randy Jia

PDF

Open Access

TL;DR

This paper introduces a posterior sampling algorithm for reinforcement learning that achieves near-optimal worst-case regret bounds in finite, communicating Markov Decision Processes, with theoretical guarantees matching known lower bounds.

Contribution

The paper presents a new posterior sampling algorithm with proven near-optimal worst-case regret bounds for communicating MDPs, including novel anti-concentration results for Dirichlet distributions.

Findings

01

Regret bound of O(DS\u221a(AT)) for communicating MDPs

02

Matching the lower bound ( S A T)

03

Novel anti-concentration results for Dirichlet distributions

Abstract

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov Decision Process (MDP) is communicating with a finite, though unknown, diameter. Our main result is a high probability regret upper bound of $\tilde{O} (D S A T)$ for any communicating MDP with $S$ states, $A$ actions and diameter $D$ . Here, regret compares the total reward achieved by the algorithm to the total expected reward of an optimal infinite-horizon undiscounted average reward policy, in time horizon $T$ . This result closely matches the known lower bound of $Ω (D S A T)$ . Our techniques involve proving some novel results about the anti-concentration of Dirichlet distribution, which may be of independent interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms