Improved No-Regret Algorithms for Stochastic Shortest Path with Linear   MDP

Liyu Chen; Rahul Jain; Haipeng Luo

arXiv:2112.09859·cs.LG·December 21, 2021

Improved No-Regret Algorithms for Stochastic Shortest Path with Linear MDP

Liyu Chen, Rahul Jain, Haipeng Luo

PDF

Open Access

TL;DR

This paper introduces two new no-regret algorithms for stochastic shortest path problems with linear MDPs, improving computational efficiency and regret bounds over previous methods, with one achieving horizon-free regret.

Contribution

The paper presents two novel algorithms for linear MDP SSPs, one computationally efficient with improved regret bounds, and another horizon-free with near-optimal regret, advancing the state-of-the-art.

Findings

01

Efficient algorithm achieves $ ilde{O}( ext{poly}(d,B_{ ext{max}},T_{ ext{hit}}) imes ext{sqrt}(K))$ regret.

02

Modified algorithm attains logarithmic regret under certain conditions.

03

Second algorithm achieves horizon-free regret $ ilde{O}(d^{3.5} B_{ ext{max}} imes ext{sqrt}(K))$, nearly matching lower bounds.

Abstract

We introduce two new no-regret algorithms for the stochastic shortest path (SSP) problem with a linear MDP that significantly improve over the only existing results of (Vial et al., 2021). Our first algorithm is computationally efficient and achieves a regret bound $O (d^{3} B_{⋆}^{2} T_{⋆} K)$ , where $d$ is the dimension of the feature space, $B_{⋆}$ and $T_{⋆}$ are upper bounds of the expected costs and hitting time of the optimal policy respectively, and $K$ is the number of episodes. The same algorithm with a slight modification also achieves logarithmic regret of order $O (\frac{d ^{3} B _{⋆}^{4}}{c _{m i n}^{2} gap _{m i n}} ln^{5} \frac{d B _{⋆} K}{c _{m i n}})$ , where $gap_{m i n}$ is the minimum sub-optimality gap and $c_{m i n}$ is the minimum cost over all state-action pairs. Our result is obtained by developing a simpler and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques