RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing
Yuhan Tang, Kangxin Cui, Jung Ho Park, Yibo Zhao, Xuan Jiang, Haoze He, Jiangbo Yu, Haris Koutsopoulos, Jinhua Zhao

TL;DR
This paper introduces RAST-MoE-RL, a regime-aware MoE framework for deep reinforcement learning in ride-hailing, improving efficiency and robustness in dynamic, uncertain environments.
Contribution
It proposes a novel MoE-based RL architecture that captures non-stationary demand patterns and congestion effects more effectively than existing shallow models.
Findings
Reduces average matching delay by 10% on Uber data.
Decreases pickup delay by 15%.
Demonstrates robustness to unseen demand regimes.
Abstract
Ride-hailing platforms face the challenge of balancing passenger waiting times with overall system efficiency under highly uncertain supply-demand conditions. Adaptive delayed matching, which controls the holding intervals for batched sets of requests and vehicles, reveals an inherent trade-off between matching and pickup delays. The resulting environment with temporally varying request arrival patterns and dynamic congestion calls for more expressive networks with sufficient capacity to capture their non-stationarity. To address the limitations of existing methods that rely on shallow encoders that cannot capture dynamic supply-demand patterns and congestion effects, we introduce the Regime-Aware Spatio-Temporal Mixture-of-Experts (RAST-MoE) framework, which formalizes adaptive delayed matching as a regime-aware Markov Decision Process and equips RL agents with a self-attention MoE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
