TL;DR
This paper introduces a novel multi-agent reinforcement learning approach for ride-sharing order dispatch that avoids value function estimation by using one-step group rewards, improving efficiency and accuracy.
Contribution
It proposes the One-Step Policy Optimization (OSPO) method, leveraging homogeneous AV fleet properties to enhance large-scale dispatching without critic estimation.
Findings
OSPO outperforms GRPO in all tested scenarios.
Both methods effectively optimize pickup times and served orders.
Experiments on real-world data validate the approach's efficiency.
Abstract
Order dispatch is a critical task in ride-sharing systems with Autonomous Vehicles (AVs), directly influencing efficiency and profits. Recently, Multi-Agent Reinforcement Learning (MARL) has emerged as a promising solution to this problem by decomposing the large state and action spaces among individual agents, effectively addressing the Curse of Dimensionality (CoD) in transportation market, which is caused by the substantial number of vehicles, passengers, and orders. However, conventional MARL-based approaches heavily rely on accurate estimation of the value function, which becomes problematic in large-scale, highly uncertain environments. To address this issue, we propose two novel methods that bypass value function estimation, leveraging the homogeneous property of AV fleets. First, we draw an analogy between AV fleets and groups in Group Relative Policy Optimization (GRPO),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
