An interactive enhanced driving dataset for autonomous driving

Haojie Feng; Peizhi Zhang; Mengjie Tian; Xinrui Zhang; Zhuoren Li; Junpeng Huang; Xiurong Wang; Junfan Zhu; Jianzhou Wang; Dongxiao Yin; Lu Xiong

arXiv:2602.20575·cs.CV·February 25, 2026

An interactive enhanced driving dataset for autonomous driving

Haojie Feng, Peizhi Zhang, Mengjie Tian, Xinrui Zhang, Zhuoren Li, Junpeng Huang, Xiurong Wang, Junfan Zhu, Jianzhou Wang, Dongxiao Yin, Lu Xiong

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the Interactive Enhanced Driving Dataset (IEDD), a large-scale, multimodal dataset with synthetic BEV videos and aligned language annotations, to improve Vision-Language-Action models for autonomous driving.

Contribution

The paper presents a scalable pipeline for mining interactive driving segments and creates the IEDD-VQA dataset with synthetic BEV videos and semantic language alignment.

Findings

01

Benchmark results for ten Vision Language Models demonstrate dataset utility.

02

The dataset enables better assessment and fine-tuning of autonomous driving models.

03

Synthetic BEV videos improve multimodal interaction understanding.

Abstract

The evolution of autonomous driving towards full automation demands robust interactive capabilities; however, the development of Vision-Language-Action (VLA) models is constrained by the sparsity of interactive scenarios and inadequate multimodal alignment in existing data. To this end, this paper proposes the Interactive Enhanced Driving Dataset (IEDD). We develop a scalable pipeline to mine million-level interactive segments from naturalistic driving data based on interactive trajectories, and design metrics to quantify the interaction processes. Furthermore, the IEDD-VQA dataset is constructed by generating synthetic Bird's Eye View (BEV) videos where semantic actions are strictly aligned with structured language. Benchmark results evaluating ten mainstream Vision Language Models (VLMs) are provided to demonstrate the dataset's reuse value in assessing and fine-tuning the reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Egikk/IEDD
dataset· 64 dl
64 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications