Trajeglish: Traffic Modeling as Next-Token Prediction
Jonah Philion, Xue Bin Peng, Sanja Fidler

TL;DR
This paper introduces Trajeglish, a traffic scenario modeling approach using token prediction with a GPT-like model, achieving state-of-the-art realism and interaction metrics in traffic simulation.
Contribution
It presents a novel discrete tokenization scheme and autoregressive modeling framework for multi-agent traffic scenarios, surpassing prior benchmarks in realism and interaction.
Findings
Outperforms previous models on Waymo Sim Agents Benchmark
Shows adaptability of learned representations to nuScenes data
Analyzes the impact of context length and intra-timestep interactions
Abstract
A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of discrete motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy…
Peer Reviews
Decision·ICLR 2024 poster
The key strengths of this work lie in its conceptual and architectural simplicity in comparison to existing methods. The idea is well-motivated and the presentation is clear. Besides this, the paper provides a detailed experimental analysis on different aspects of the proposed design space.
1. The benchmarking in Table 1 follows a much simpler setting with fewer max agents (24 vs. 128) and a shorter time horizon (6 seconds vs. 8 seconds) than prior work on WOMD [1,2,3]. 2. As a result of this simpler benchmark and missing comparisons to any prior architecture, this paper does not address the key question of whether the proposed method is competitive to the current state-of-the-art despite its simplicity. At a glance, it seems to be much worse, with a minADE >3m in comparison to th
* Strong tokenizer k-disks outperforming kMeans baselines with low discretization errors and convincing ablation study * Autoregressive and casual rollouts * Experiments demonstrating the benefits of intra-timestep dependence of agents * Experiments demonstrating the transfer to nuScenes
* Missing WOMD baseline results from other models * Similar contributions as the recently published “MotionLM: Multi-Agent Motion Forecasting as Language Modeling” (https://arxiv.org/pdf/2309.16534.pdf)
### 1.The idea of tokenization using a small vocabulary is moderately novel. ### 2.The visualization and illustration are well made and help the readers to understand the paper.
## Major: ### 1.motivation of using tokenization (compared with using the actual values as in most of existing work in Appendix B) is not very clear. ### 2.the experimental results are not very impressive (1) Improvements in Table 1 seem quite small. Can you show standard deviations for the results? (2) only evaluate on open-loop simulation but not on close-loop simulation (3) the baseline details are not given (e.g., “The “marginal” baseline is an equally important baseline designed to mi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques · Time Series Analysis and Forecasting
