Language Prompt for Autonomous Driving
Dongming Wu, Wencheng Han, Yingfei Liu, Tiancai Wang, Cheng-zhong Xu,, Xiangyu Zhang, Jianbing Shen

TL;DR
This paper introduces NuPrompt, a large dataset of language prompts for driving scenes, and a Transformer-based model PromptTrack for predicting object trajectories from natural language descriptions, advancing language-guided autonomous driving research.
Contribution
It creates the first object-centric language prompt dataset for driving scenes and formulates a new prompt-based trajectory prediction task with a baseline model.
Findings
PromptTrack achieves impressive performance on NuPrompt.
NuPrompt contains 40,147 language descriptions for driving scenes.
The dataset expands nuScenes with object-centric language annotations.
Abstract
A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands nuScenes dataset by constructing a total of 40,147 language descriptions, each referring to an average of 7.4 object tracklets. Based on the object-text pairs from the new benchmark, we formulate a novel prompt-based driving task, \ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Byte Pair Encoding · Softmax · Dropout · Label Smoothing · Absolute Position Encodings
