Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

Wenhui Huang; Songyan Zhang; Collister Chua; Yang Liang; Zhiqi Mao; Heng Yang; Chen Lv

arXiv:2604.22260·cs.CV·April 27, 2026

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

Wenhui Huang, Songyan Zhang, Collister Chua, Yang Liang, Zhiqi Mao, Heng Yang, Chen Lv

PDF

TL;DR

This paper introduces LTD, a large-scale vision-language dataset for urban traffic reasoning, and UniVLT, a foundation model that unifies microscopic autonomous driving and macroscopic traffic analysis, advancing safe mobility.

Contribution

The work presents LTD, a comprehensive dataset for open-ended traffic reasoning, and UniVLT, a novel foundation model integrating diverse traffic analysis tasks within a single architecture.

Findings

01

UniVLT achieves state-of-the-art performance on open-ended traffic reasoning tasks.

02

LTD enables reasoning over heterogeneous roadside camera observations in urban environments.

03

Existing models show limitations in complex multi-view traffic scenarios.

Abstract

Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception and reasoning in intelligent transportation systems (ITS), existing research remains largely centered on microscopic autonomous driving (AD), with limited attention to city-scale traffic analysis. In particular, open-ended safety-oriented visual question answering (VQA) and corresponding foundation models for reasoning over heterogeneous roadside camera observations remain underexplored. To address this gap, we introduce the Land Transportation Dataset (LTD), a large-scale open-source vision-language dataset for open-ended reasoning in urban traffic environments. LTD contains 11.6K high-quality VQA pairs collected from heterogeneous roadside…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.