Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP
Liyu Chen, Rahul Jain, Haipeng Luo

TL;DR
This paper introduces two new no-regret algorithms for stochastic shortest path problems with linear MDPs, improving computational efficiency and regret bounds over previous methods, with one achieving horizon-free regret.
Contribution
The paper presents two novel algorithms for linear MDP SSPs, one computationally efficient with improved regret bounds, and another horizon-free with near-optimal regret, advancing the state-of-the-art.
Findings
Efficient algorithm achieves $ ilde{O}( ext{poly}(d,B_{ ext{max}},T_{ ext{hit}}) imes ext{sqrt}(K))$ regret.
Modified algorithm attains logarithmic regret under certain conditions.
Second algorithm achieves horizon-free regret $ ilde{O}(d^{3.5} B_{ ext{max}} imes ext{sqrt}(K))$, nearly matching lower bounds.
Abstract
We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound , where is the dimension of the feature space, and are upper bounds of the expected costs and hitting time of the optimal policy respectively, and is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order , where is the minimum sub-optimality gap and is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
