Online Learning for Stochastic Shortest Path Model via Posterior   Sampling

Mehdi Jafarnia-Jahromi; Liyu Chen; Rahul Jain; Haipeng Luo

arXiv:2106.05335·cs.LG·June 11, 2021·6 cites

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo

PDF

Open Access

TL;DR

This paper introduces PSRL-SSP, a novel posterior sampling algorithm for online reinforcement learning in stochastic shortest path problems, providing theoretical regret bounds and outperforming existing optimism-based methods.

Contribution

The paper presents the first posterior sampling-based algorithm for SSP, with a proven Bayesian regret bound and no need for hyper-parameter tuning.

Findings

01

Achieves a Bayesian regret bound of O(B_* S√A K).

02

Outperforms previous optimism-based algorithms in numerical experiments.

03

Requires only prior distribution knowledge, no hyper-parameters.

Abstract

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL-SSP, a simple posterior sampling-based reinforcement learning algorithm for the SSP problem. The algorithm operates in epochs. At the beginning of each epoch, a sample is drawn from the posterior distribution on the unknown model dynamics, and the optimal policy with respect to the drawn sample is followed during that epoch. An epoch completes if either the number of visits to the goal state in the current epoch exceeds that of the previous epoch, or the number of visits to any of the state-action pairs is doubled. We establish a Bayesian regret bound of $O (B_{⋆} S A K)$ , where $B_{⋆}$ is an upper bound on the expected cost of the optimal policy, $S$ is the size of the state space, $A$ is the size of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Smart Grid Energy Management