Regret Guarantees for Linear Contextual Stochastic Shortest Path
Dor Polikar, Alon Cohen

TL;DR
This paper introduces LR-CSSP, an algorithm for linear contextual stochastic shortest path problems, providing regret guarantees and handling continuous contexts with all episodes terminating efficiently.
Contribution
The paper proposes LR-CSSP, a novel algorithm with regret bounds for linear CSSP, addressing challenges of unknown dynamics and continuous contexts in stochastic shortest path problems.
Findings
LR-CSSP achieves sublinear regret bounds.
The algorithm handles continuous context spaces effectively.
All episodes terminate within a reasonable timeframe.
Abstract
We define the problem of linear Contextual Stochastic Shortest Path (CSSP), where at the beginning of each episode, the learner observes an adversarially chosen context that determines the MDP through a fixed but unknown linear function. The learner's objective is to reach a designated goal state with minimal expected cumulative loss, despite having no prior knowledge of the transition dynamics, loss functions, or the mapping from context to MDP. In this work, we propose LR-CSSP, an algorithm that achieves a regret bound of , where is the number of episodes, is the context dimension, and are the sets of states and actions respectively, bounds the optimal cumulative loss and , unknown to the learner, bounds the expected time for the optimal policy to reach the goal. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
