EMMA: End-to-End Multimodal Model for Autonomous Driving

Jyh-Jing Hwang; Runsheng Xu; Hubert Lin; Wei-Chih Hung; Jingwei Ji; Kristy Choi; Di Huang; Tong He; Paul Covington; Benjamin Sapp; Yin Zhou; James Guo; Dragomir Anguelov; Mingxing Tan

arXiv:2410.23262·cs.CV·September 24, 2025·5 cites

EMMA: End-to-End Multimodal Model for Autonomous Driving

Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung, Jingwei Ji, Kristy Choi, Di Huang, Tong He, Paul Covington, Benjamin Sapp, Yin Zhou, James Guo, Dragomir Anguelov, Mingxing Tan

PDF

Open Access

TL;DR

EMMA is a unified multimodal model that leverages large language model capabilities to process raw sensor data and perform multiple autonomous driving tasks with state-of-the-art results.

Contribution

This paper introduces EMMA, a novel end-to-end multimodal model that unifies various autonomous driving tasks within a language-based framework, enhancing performance and versatility.

Findings

01

State-of-the-art motion planning on nuScenes

02

Competitive 3D object detection on WOD

03

Effective multi-task co-training improves all domains

Abstract

We introduce EMMA, an End-to-end Multimodal Model for Autonomous driving. Built upon a multi-modal large language model foundation like Gemini, EMMA directly maps raw camera sensor data into various driving-specific outputs, including planner trajectories, perception objects, and road graph elements. EMMA maximizes the utility of world knowledge from the pre-trained large language models, by representing all non-sensor inputs (e.g. navigation instructions and ego vehicle status) and outputs (e.g. trajectories and 3D locations) as natural language text. This approach allows EMMA to jointly process various driving tasks in a unified language space, and generate the outputs for each task using task-specific prompts. Empirically, we demonstrate EMMA's effectiveness by achieving state-of-the-art performance in motion planning on nuScenes as well as competitive results on the Waymo Open…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation and Mobility Innovations