Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions
An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, and Laurent El Ghaoui

TL;DR
This paper introduces a semi-Markov decision process framework for city-scale EV ride-hailing, ensuring feasible actions and robust learning under demand uncertainty, with superior profit and safety performance.
Contribution
It develops a novel semi-MDP model with feasibility guarantees, combining high-level intentions, MILP projections, and a robust SAC-based RL approach with graph neural networks.
Findings
PD--RSAC achieves a net profit of $1.22M, outperforming baselines.
The method maintains zero feeder-limit violations.
Experiments demonstrate robustness under demand uncertainty.
Abstract
We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed actions -- discrete actions for serving, repositioning, and charging, together with continuous charging power -- and variable action durations. To guarantee physical feasibility during both training and deployment, the policy learns over high-level intentions produced by a masked, temperature-annealed actor. These intentions are projected at every decision step through a time-limited rolling mixed-integer linear program (MILP) that strictly enforces state-of-charge, port, and feeder constraints. To mitigate distributional shifts, we optimize a Soft Actor--Critic (SAC) agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
