H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch
Benjamin Rivi\`ere, Soon-Jo Chung

TL;DR
H-TD2 is an adaptive, scalable hybrid reinforcement learning algorithm for taxi dispatch that balances local and centralized decision-making, significantly reducing customer wait times in urban environments.
Contribution
The paper introduces a novel hybrid temporal difference learning algorithm that combines distributed and centralized updates with bounded sub-optimality for urban taxi dispatch.
Findings
Decreases average customer waiting time by 50% in Gridworld simulations.
Reduces average customer waiting time by 26% in real Chicago taxi data.
Provides a scalable, adaptive dispatch policy robust to domain changes.
Abstract
We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch, a model-free, adaptive decision-making algorithm to coordinate a large fleet of automated taxis in a dynamic urban environment to minimize expected customer waiting times. Our scalable algorithm exploits the natural transportation network company topology by switching between two behaviors: distributed temporal-difference learning computed locally at each taxi and infrequent centralized Bellman updates computed at the dispatch center. We derive a regret bound and design the trigger condition between the two behaviors to explicitly control the trade-off between computational complexity and the individual taxi policy's bounded sub-optimality; this advances the state of the art by enabling distributed operation with bounded-suboptimality. Additionally, unlike recent reinforcement learning dispatch methods, this policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Smart Grid Energy Management · Reinforcement Learning in Robotics
