GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu; Lloyd Russell; Hudson Yeo; Zak Murez; George; Fedoseev; Alex Kendall; Jamie Shotton; Gianluca Corrado

arXiv:2309.17080·cs.CV·October 2, 2023·27 cites

GAIA-1: A Generative World Model for Autonomous Driving

Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George, Fedoseev, Alex Kendall, Jamie Shotton, Gianluca Corrado

PDF

Open Access 1 Repo 1 Video

TL;DR

GAIA-1 introduces a generative world model for autonomous driving that predicts and generates realistic driving scenarios using video, text, and actions, enhancing scene understanding and training efficiency.

Contribution

It presents a novel unsupervised sequence modeling approach that captures high-level scene structures and dynamics for autonomous driving applications.

Findings

01

Learns high-level scene structures and dynamics

02

Generates realistic driving scenarios

03

Improves autonomous training processes

Abstract

Autonomous driving promises transformative improvements to transportation, but building systems capable of safely navigating the unstructured complexity of real-world scenarios remains challenging. A critical problem lies in effectively predicting the various potential outcomes that may emerge in response to the vehicle's actions as the world evolves. To address this challenge, we introduce GAIA-1 ('Generative AI for Autonomy'), a generative world model that leverages video, text, and action inputs to generate realistic driving scenarios while offering fine-grained control over ego-vehicle behavior and scene features. Our approach casts world modeling as an unsupervised sequence modeling problem by mapping the inputs to discrete tokens, and predicting the next token in the sequence. Emerging properties from our model include learning high-level structures and scene dynamics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yvanyin/drivingworld
pytorch

Videos

Not Slowing Down: GAIA-1 to GPT Vision Tips, Nvidia B100 to Bard vs LLaVA· youtube

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Artificial Intelligence in Games