One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

Zijian Zhao; Sen Li

arXiv:2507.15351·cs.AI·April 17, 2026

One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

Zijian Zhao, Sen Li

PDF

1 Repo

TL;DR

This paper introduces a novel multi-agent reinforcement learning approach for ride-sharing order dispatch that avoids value function estimation by using one-step group rewards, improving efficiency and accuracy.

Contribution

It proposes the One-Step Policy Optimization (OSPO) method, leveraging homogeneous AV fleet properties to enhance large-scale dispatching without critic estimation.

Findings

01

OSPO outperforms GRPO in all tested scenarios.

02

Both methods effectively optimize pickup times and served orders.

03

Experiments on real-world data validate the approach's efficiency.

Abstract

Order dispatch is a critical task in ride-sharing systems with Autonomous Vehicles (AVs), directly influencing efficiency and profits. Recently, Multi-Agent Reinforcement Learning (MARL) has emerged as a promising solution to this problem by decomposing the large state and action spaces among individual agents, effectively addressing the Curse of Dimensionality (CoD) in transportation market, which is caused by the substantial number of vehicles, passengers, and orders. However, conventional MARL-based approaches heavily rely on accurate estimation of the value function, which becomes problematic in large-scale, highly uncertain environments. To address this issue, we propose two novel methods that bypass value function estimation, leveraging the homogeneous property of AV fleets. First, we draw an analogy between AV fleets and groups in Group Relative Policy Optimization (GRPO),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RS2002/OSPO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.