GAIA-1: A Generative World Model for Autonomous Driving
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George, Fedoseev, Alex Kendall, Jamie Shotton, Gianluca Corrado

TL;DR
GAIA-1 introduces a generative world model for autonomous driving that predicts and generates realistic driving scenarios using video, text, and actions, enhancing scene understanding and training efficiency.
Contribution
It presents a novel unsupervised sequence modeling approach that captures high-level scene structures and dynamics for autonomous driving applications.
Findings
Learns high-level scene structures and dynamics
Generates realistic driving scenarios
Improves autonomous training processes
Abstract
Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle's actions as the world evolves. To address this challenge, we introduce GAIA-1 ('Generative AI for Autonomy'), a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features. Our approach casts world modeling as an unsupervised sequence modeling problem by mapping the inputs to discrete tokens, and predicting the next token in the sequence. Emerging properties from our model include learning high-level structures and scene dynamics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA· youtube
Taxonomy
TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games
