Multi-armed Bandit for Stochastic Shortest Path in Mixed Autonomy

Yu Bai; Yiming Li; Xi Xiong

arXiv:2505.05878·math.OC·May 12, 2025·ITSC

Multi-armed Bandit for Stochastic Shortest Path in Mixed Autonomy

Yu Bai, Yiming Li, Xi Xiong

PDF

Open Access

TL;DR

This paper introduces a novel RTDP-based algorithm incorporating UCB exploration for mixed-autonomy traffic routing, effectively balancing exploration and exploitation to find optimal strategies in stochastic environments.

Contribution

It develops a new RTDP-based multi-armed bandit algorithm with UCB exploration for stochastic routing in mixed-autonomy traffic networks, providing theoretical guarantees and improved efficiency.

Findings

01

The algorithm guarantees worst-case convergence to optimal policies.

02

It outperforms standard RTDP in highly stochastic environments.

03

It demonstrates superior computational efficiency over Value Iteration.

Abstract

In mixed-autonomy traffic networks, autonomous vehicles (AVs) are required to make sequential routing decisions under uncertainty caused by dynamic and heterogeneous interactions with human-driven vehicles (HDVs). Early-stage greedy decisions made by AVs during interactions with the environment often result in insufficient exploration, leading to failures in discovering globally optimal strategies. The exploration-exploitation balancing mechanism inherent in multi-armed bandit (MAB) methods is well-suited for addressing such problems. Based on the Real-Time Dynamic Programming (RTDP) framework, we introduce the Upper Confidence Bound (UCB) exploration strategy from the MAB paradigm and propose a novel algorithm. We establish the path-level regret upper bound under the RTDP framework, which guarantees the worst-case convergence of the proposed algorithm. Extensive numerical experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · Transportation and Mobility Innovations · Reinforcement Learning in Robotics