Equilibrium Inverse Reinforcement Learning for Ride-hailing Vehicle Network
Takuma Oda

TL;DR
This paper introduces an equilibrium inverse reinforcement learning framework for ride-hailing networks, enabling robust driver behavior modeling and efficient policy computation in multi-agent settings with real-world data.
Contribution
It develops a novel equilibrium inverse reinforcement learning algorithm for multi-agent ride-hailing scenarios, capable of learning transferable driver reward functions and fast policy computation.
Findings
Outperforms baselines in imitation accuracy on real-world data
Computational time is independent of the number of agents
Robust to changes in supply-demand distributions and data quality
Abstract
Ubiquitous mobile computing have enabled ride-hailing services to collect vast amounts of behavioral data of riders and drivers and optimize supply and demand matching in real time. While these mobility service providers have some degree of control over the market by assigning vehicles to requests, they need to deal with the uncertainty arising from self-interested driver behavior since workers are usually free to drive when they are not assigned tasks. In this work, we formulate the problem of passenger-vehicle matching in a sparsely connected graph and proposed an algorithm to derive an equilibrium policy in a multi-agent environment. Our framework combines value iteration methods to estimate the optimal policy given expected state visitation and policy propagation to compute multi-agent state visitation frequencies. Furthermore, we developed a method to learn the driver's reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation and Mobility Innovations · Sharing Economy and Platforms · Transportation Planning and Optimization
Methodstravel james
