Scalable Deep Reinforcement Learning for Ride-Hailing
Jiekun Feng, Mark Gluzman, J. G. Dai

TL;DR
This paper introduces a scalable deep reinforcement learning approach for ride-hailing services by decomposing the action space, enabling efficient control of many agents, and demonstrating its effectiveness with real-world data.
Contribution
A novel action decomposition method for MDPs in ride-hailing, allowing deep RL to scale to many agents and improve control policy optimization.
Findings
Decomposition improves scalability of RL in ride-hailing.
Method effective on real Didi Chuxing data.
Enhanced control policy performance.
Abstract
Ride-hailing services, such as Didi Chuxing, Lyft, and Uber, arrange thousands of cars to meet ride requests throughout the day. We consider a Markov decision process (MDP) model of a ride-hailing service system, framing it as a reinforcement learning (RL) problem. The simultaneous control of many agents (cars) presents a challenge for the MDP optimization because the action space grows exponentially with the number of cars. We propose a special decomposition for the MDP actions by sequentially assigning tasks to the drivers. The new actions structure resolves the scalability problem and enables the use of deep RL algorithms for control policy optimization. We demonstrate the benefit of our proposed decomposition with a numerical experiment based on real data from Didi Chuxing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
