Regret Guarantees for Linear Contextual Stochastic Shortest Path

Dor Polikar; Alon Cohen

arXiv:2511.12534·cs.LG·November 18, 2025

Regret Guarantees for Linear Contextual Stochastic Shortest Path

Dor Polikar, Alon Cohen

PDF

Open Access

TL;DR

This paper introduces LR-CSSP, an algorithm for linear contextual stochastic shortest path problems, providing regret guarantees and handling continuous contexts with all episodes terminating efficiently.

Contribution

The paper proposes LR-CSSP, a novel algorithm with regret bounds for linear CSSP, addressing challenges of unknown dynamics and continuous contexts in stochastic shortest path problems.

Findings

01

LR-CSSP achieves sublinear regret bounds.

02

The algorithm handles continuous context spaces effectively.

03

All episodes terminate within a reasonable timeframe.

Abstract

We define the problem of linear Contextual Stochastic Shortest Path (CSSP), where at the beginning of each episode, the learner observes an adversarially chosen context that determines the MDP through a fixed but unknown linear function. The learner's objective is to reach a designated goal state with minimal expected cumulative loss, despite having no prior knowledge of the transition dynamics, loss functions, or the mapping from context to MDP. In this work, we propose LR-CSSP, an algorithm that achieves a regret bound of $O (K^{2/3} d^{2/3} ∣ S ∣∣ A ∣^{1/3} B_{⋆}^{2} T_{⋆} lo g (1/ δ))$ , where $K$ is the number of episodes, $d$ is the context dimension, $S$ and $A$ are the sets of states and actions respectively, $B_{⋆}$ bounds the optimal cumulative loss and $T_{⋆}$ , unknown to the learner, bounds the expected time for the optimal policy to reach the goal. In the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques