Massively Scalable Inverse Reinforcement Learning in Google Maps
Matt Barnes, Matthew Abueg, Oliver F. Lange, Matt Deeds, Jason Trader,, Denali Molitor, Markus Wulfmeier, Shawn O'Banion

TL;DR
This paper introduces scalable inverse reinforcement learning techniques for planetary-scale route optimization in Google Maps, achieving significant improvements in route quality and addressing computational challenges at unprecedented scale.
Contribution
It presents novel scaling methods, a new IRL algorithm (RHIP), and demonstrates the largest real-world IRL study with substantial performance gains.
Findings
16-24% improvement in route quality
Achieved planetary-scale IRL with hundreds of millions of states
Identified trade-offs between deterministic and stochastic planning
Abstract
Inverse reinforcement learning (IRL) offers a powerful and general framework for learning humans' latent preferences in route recommendation, yet no approach has successfully addressed planetary-scale problems with hundreds of millions of states and demonstration trajectories. In this paper, we introduce scaling techniques based on graph compression, spatial parallelization, and improved initialization conditions inspired by a connection to eigenvector algorithms. We revisit classic IRL methods in the routing context, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. This insight is leveraged in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our…
Peer Reviews
Decision·ICLR 2024 spotlight
* A compelling problem * Real-world empirical experimental problem considered * The paper does a good job straddling both novel theory advancements, and practical and engineering advancements, but presents the findings appropriately for the ICLR audience. * The connections with graph theoretic results (App. A1, A2, and Theorem B3) are useful and insightful. * The paper and appendices include negative results, in addition to the main results - this is encouraging to see (more papers should do
* The literature review is compact and the theory background provides a rapid but very nice summary of classical IRL results (in particular the unifying view of stochastic vs. deterministic policy trade-offs is helpful). One relevant piece of prior work that isn't mentioned however is the improved MaxEnt approach(es) by Snoswell et al. (e.g. [B, C]) - which address theoretical and empirical limitations with Ziebart's MaxEnt model, and are specifically applied to the problem of route optimization
1. The authors address a well-motivated and useful application to show the statistically significant gains obtained from scalable IRL in route recommendation. The techniques that worked for this task have been clearly explained, along with explanations and evidence for some techniques that didn't work. 2. The proposed method unifies several prior IRL algorithms through the RHIP framework for trading-off quality of route recommendation with convergence speed. This helps improve understanding of
1. The experimental results are not from real-time execution of the proposed method and utilizes static features of the road network for route optimization. Incorporating dynamic features, for example varying traffic flow throughout a day, planned or unplanned diversions and road closures etc. would increase the difficulty of obtaining a scalable DP approach. 2. The reward function is learning a scalar value, whereas in the real world for applications like route optimization, it should intuit
The paper addresses an interesting problem. Learning with very large-scale routing datasets would have significant applications in modern transportation systems. The techniques used in the paper (except for MaxEnt, as I will discuss in the Weaknesses) are sound and relevant. The algorithm seems to work well (but again, the experiments lack comparisons with more scalable IRL algorithms, as I will discuss later).
My biggest concern is that the paper primarily revolves around MaxEnt, which was developed about 15 years ago and is now very outdated. In the introduction, the authors state that MaxEnt is limited in its scalability, which is true. Recent literature on IRL has introduced many advanced algorithms to address this issue. For instance, Adversarial IRL [1] and IQ-Learn [2], value DICE [3] are well-known recent IRL algorithms that are much more scalable. Therefore, it is crucial to focus on these alg
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms
