A Unified Algorithm for Stochastic Path Problems
Christoph Dann, Chen-Yu Wei, Julian Zimmert

TL;DR
This paper introduces a unified reinforcement learning algorithm for stochastic path problems, providing regret guarantees and adaptation methods for different reward scales, advancing understanding of optimal strategies in these complex environments.
Contribution
It offers the first regret guarantees for general stochastic path problems and develops adaptation procedures for reward scale uncertainty in SSP and SLP cases.
Findings
Regret bounds match the best known for SSP with non-positive rewards.
An adaptation procedure for unknown reward scale $B_\star$ in SSP is proposed.
A lower bound shows unavoidable costs for adaptation in SLP.
Abstract
We study reinforcement learning in stochastic path (SP) problems. The goal in these problems is to maximize the expected sum of rewards until the agent reaches a terminal state. We provide the first regret guarantees in this general problem by analyzing a simple optimistic algorithm. Our regret bound matches the best known results for the well-studied special case of stochastic shortest path (SSP) with all non-positive rewards. For SSP, we present an adaptation procedure for the case when the scale of rewards is unknown. We show that there is no price for adaptation, and our regret bound matches that with a known . We also provide a scale adaptation procedure for the special case of stochastic longest paths (SLP) where all rewards are non-negative. However, unlike in SSP, we show through a lower bound that there is an unavoidable price for adaptation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Risk and Portfolio Optimization
