Policy Optimization for Stochastic Shortest Path

Liyu Chen; Haipeng Luo; Aviv Rosenberg

arXiv:2202.03334·cs.LG·February 8, 2022

Policy Optimization for Stochastic Shortest Path

Liyu Chen, Haipeng Luo, Aviv Rosenberg

PDF

Open Access

TL;DR

This paper introduces a novel policy optimization approach for stochastic shortest path problems, providing near-optimal regret bounds across various settings and proposing a new approximation scheme that improves learning efficiency.

Contribution

It develops a new stacked discounted approximation scheme and extends policy optimization to SSP, achieving near-optimal regret bounds in diverse environments.

Findings

01

Achieves near-optimal regret bounds in multiple SSP settings.

02

Introduces a new approximation scheme called stacked discounted approximation.

03

Enables learning near-stationary policies with minimal changes during episodes.

Abstract

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees. In this work, we initiate the study of policy optimization for the stochastic shortest path (SSP) problem, a goal-oriented reinforcement learning model that strictly generalizes the finite-horizon model and better captures many applications. We consider a wide range of settings, including stochastic and adversarial environments under full information or bandit feedback, and propose a policy optimization algorithm for each setting that makes use of novel correction terms and/or variants of dilated bonuses (Luo et al., 2021). For most settings, our algorithm is shown to achieve a near-optimal regret bound. One key technical contribution of this work is a new approximation scheme to tackle SSP problems that we call…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification