Doe-1: Closed-Loop Autonomous Driving with Large World Model

Wenzhao Zheng; Zetian Xia; Yuanhui Huang; Sicheng Zuo; Jie Zhou; Jiwen; Lu

arXiv:2412.09627·cs.CV·December 13, 2024

Doe-1: Closed-Loop Autonomous Driving with Large World Model

Wenzhao Zheng, Zetian Xia, Yuanhui Huang, Sicheng Zuo, Jie Zhou, Jiwen, Lu

PDF

Open Access 1 Repo

TL;DR

Doe-1 introduces a unified closed-loop framework for autonomous driving that leverages a large world model to improve perception, prediction, and planning through autoregressive token generation, demonstrating effectiveness on the nuScenes dataset.

Contribution

The paper presents a novel large world model (Doe-1) that unifies perception, prediction, and planning in autonomous driving using a multi-modal transformer and token-based representation.

Findings

01

Effective in visual question-answering tasks.

02

Generates accurate action-conditioned video.

03

Improves motion planning performance.

Abstract

End-to-end autonomous driving has received increasing attention due to its potential to learn from large amounts of data. However, most existing methods are still open-loop and suffer from weak scalability, lack of high-order interactions, and inefficient decision-making. In this paper, we explore a closed-loop framework for autonomous driving and propose a large Driving wOrld modEl (Doe-1) for unified perception, prediction, and planning. We formulate autonomous driving as a next-token generation problem and use multi-modal tokens to accomplish different tasks. Specifically, we use free-form texts (i.e., scene descriptions) for perception and generate future predictions directly in the RGB space with image tokens. For planning, we employ a position-aware tokenizer to effectively encode action into discrete tokens. We train a multi-modal transformer to autoregressively generate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wzzheng/doe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Traffic Prediction and Management Techniques · Simulation Techniques and Applications

MethodsSoftmax · Attention Is All You Need