(More) Efficient Reinforcement Learning via Posterior Sampling

Ian Osband; Daniel Russo; Benjamin Van Roy

arXiv:1306.0940·stat.ML·December 30, 2013·247 cites

(More) Efficient Reinforcement Learning via Posterior Sampling

Ian Osband, Daniel Russo, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper explores posterior sampling for reinforcement learning (PSRL), an alternative to optimism-based algorithms, demonstrating its theoretical efficiency and practical superiority through regret bounds and simulations.

Contribution

It introduces PSRL, a simple, computationally efficient algorithm with near-optimal regret bounds, and shows its advantages over existing methods.

Findings

01

PSRL achieves an $ ilde{O}( au S oot{2} {A T})$ regret bound.

02

PSRL outperforms existing algorithms in simulations.

03

The approach naturally encodes prior knowledge.

Abstract

Most provably-efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an $\tilde{O} (τ S A T)$ bound on the expected regret, where $T$ is time, $τ$ is the episode length and $S$ and $A$ are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms