Scaling Laws of Motion Forecasting and Planning -- Technical Report

Mustafa Baniodeh; Kratarth Goel; Scott Ettinger; Carlos Fuertes; Ari Seff; Tim Shen; Cole Gulino; Chenjie Yang; Ghassen Jerfel; Dokook Choe; Rui Wang; Benjamin Charrow; Vinutha Kallem; Sergio Casas; Rami Al-Rfou; Benjamin Sapp; Dragomir Anguelov

arXiv:2506.08228·cs.LG·September 9, 2025

Scaling Laws of Motion Forecasting and Planning -- Technical Report

Mustafa Baniodeh, Kratarth Goel, Scott Ettinger, Carlos Fuertes, Ari Seff, Tim Shen, Cole Gulino, Chenjie Yang, Ghassen Jerfel, Dokook Choe, Rui Wang, Benjamin Charrow, Vinutha Kallem, Sergio Casas, Rami Al-Rfou, Benjamin Sapp, Dragomir Anguelov

PDF

Open Access

TL;DR

This paper investigates how scaling transformer models affects motion forecasting and planning in autonomous driving, revealing power-law improvements and optimal scaling strategies for model size and data.

Contribution

It provides empirical scaling laws for transformer-based motion models, highlighting the relationship between compute, model size, data, and performance in autonomous driving tasks.

Findings

01

Model performance improves as a power-law with compute.

02

Optimal model size scales 1.5x faster than dataset size.

03

Sampling and clustering enable smaller models to match larger ones in inference.

Abstract

We study the empirical scaling laws of a family of encoder-decoder autoregressive transformer models on the task of joint motion forecasting and planning in the autonomous driving domain. Using a 500 thousand hours driving dataset, we demonstrate that, similar to language modeling, model performance improves as a power-law function of the total compute budget, and we observe a strong correlation between model training loss and model evaluation metrics. Most interestingly, closed-loop metrics also improve with scaling, which has important implications for the suitability of open-loop metrics for model development and hill climbing. We also study the optimal scaling of the number of transformer parameters and the training data size for a training compute-optimal model. We find that as the training compute budget grows, optimal scaling requires increasing the model size 1.5x as fast as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications · Human Motion and Animation