Minimax Regret for Stochastic Shortest Path
Alon Cohen, Yonathan Efroni, Yishay Mansour, Aviv Rosenberg

TL;DR
This paper establishes near-optimal regret bounds for learning in stochastic shortest path problems, introducing a reduction to finite-horizon MDPs and providing new theoretical insights into the problem's complexity.
Contribution
The authors derive tight minimax regret bounds for SSP with unknown costs, and propose a novel reduction to finite-horizon MDPs that facilitates analysis.
Findings
Matching upper and lower regret bounds for B* ≥ 1
Improved regret bounds by a factor of √|S| over previous work
Algorithm for finite-horizon MDPs with polynomial dependence on expected cost
Abstract
We study the Stochastic Shortest Path (SSP) problem in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent has no prior knowledge about the costs and dynamics of the model. She repeatedly interacts with the model for episodes, and has to minimize her regret. In this work we show that the minimax regret for this setting is where is a bound on the expected cost of the optimal policy from any state, is the state space, and is the action space. This matches the lower bound of Rosenberg et al. [2020] for , and improves their regret bound by a factor of . For we prove a matching lower bound of . Our algorithm is based on a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
