Posterior Sampling for Large Scale Reinforcement Learning

Georgios Theocharous; Zheng Wen; Yasin Abbasi-Yadkori; Nikos; Vlassis

arXiv:1711.07979·cs.LG·October 24, 2018·19 cites

Posterior Sampling for Large Scale Reinforcement Learning

Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, Nikos, Vlassis

PDF

Open Access

TL;DR

This paper introduces DS-PSRL, a practical and efficient non-episodic posterior sampling algorithm for large-scale reinforcement learning, with proven regret bounds and broad applicability.

Contribution

It presents a deterministic schedule PSRL algorithm that improves efficiency and generality over existing methods, with theoretical guarantees.

Findings

01

Outperforms state-of-the-art PSRL algorithms on benchmark problems

02

Provides a Bayesian regret bound under mild assumptions

03

Applicable to multi-parameter and continuous state-action problems

Abstract

We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management