Language Prompt for Autonomous Driving

Dongming Wu; Wencheng Han; Yingfei Liu; Tiancai Wang; Cheng-zhong Xu,; Xiangyu Zhang; Jianbing Shen

arXiv:2309.04379·cs.CV·April 1, 2025·22 cites

Language Prompt for Autonomous Driving

Dongming Wu, Wencheng Han, Yingfei Liu, Tiancai Wang, Cheng-zhong Xu,, Xiangyu Zhang, Jianbing Shen

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces NuPrompt, a large dataset of language prompts for driving scenes, and a Transformer-based model PromptTrack for predicting object trajectories from natural language descriptions, advancing language-guided autonomous driving research.

Contribution

It creates the first object-centric language prompt dataset for driving scenes and formulates a new prompt-based trajectory prediction task with a baseline model.

Findings

01

PromptTrack achieves impressive performance on NuPrompt.

02

NuPrompt contains 40,147 language descriptions for driving scenes.

03

The dataset expands nuScenes with object-centric language annotations.

Abstract

A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands nuScenes dataset by constructing a total of 40,147 language descriptions, each referring to an average of 7.4 object tracklets. Based on the object-text pairs from the new benchmark, we formulate a novel prompt-based driving task, \ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wudongming97/prompt4driving
pytorchOfficial

Videos

Language Prompt for Autonomous Driving· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings