Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

An Nguyen; Hoang Nguyen; Phuong Le; Hung Pham; Cuong Do; and Laurent El Ghaoui

arXiv:2604.25848·cs.AI·April 29, 2026

Semi-Markov Reinforcement Learning for City-Scale EV Ride-Hailing with Feasibility-Guaranteed Actions

An Nguyen, Hoang Nguyen, Phuong Le, Hung Pham, Cuong Do, and Laurent El Ghaoui

PDF

TL;DR

This paper introduces a semi-Markov decision process framework for city-scale EV ride-hailing, ensuring feasible actions and robust learning under demand uncertainty, with superior profit and safety performance.

Contribution

It develops a novel semi-MDP model with feasibility guarantees, combining high-level intentions, MILP projections, and a robust SAC-based RL approach with graph neural networks.

Findings

01

PD--RSAC achieves a net profit of $1.22M, outperforming baselines.

02

The method maintains zero feeder-limit violations.

03

Experiments demonstrate robustness under demand uncertainty.

Abstract

We study city-scale control of electric-vehicle (EV) ride-hailing fleets where dispatch, repositioning, and charging decisions must respect charger and feeder limits under uncertain, spatially correlated demand and travel times. We formulate the problem as a hex-grid semi-Markov decision process (semi-MDP) with mixed actions -- discrete actions for serving, repositioning, and charging, together with continuous charging power -- and variable action durations. To guarantee physical feasibility during both training and deployment, the policy learns over high-level intentions produced by a masked, temperature-annealed actor. These intentions are projected at every decision step through a time-limited rolling mixed-integer linear program (MILP) that strictly enforces state-of-charge, port, and feeder constraints. To mitigate distributional shifts, we optimize a Soft Actor--Critic (SAC) agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.