Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset
Wenhui Huang, Songyan Zhang, Collister Chua, Yang Liang, Zhiqi Mao, Heng Yang, Chen Lv

TL;DR
This paper introduces LTD, a large-scale vision-language dataset for urban traffic reasoning, and UniVLT, a foundation model that unifies microscopic autonomous driving and macroscopic traffic analysis, advancing safe mobility.
Contribution
The work presents LTD, a comprehensive dataset for open-ended traffic reasoning, and UniVLT, a novel foundation model integrating diverse traffic analysis tasks within a single architecture.
Findings
UniVLT achieves state-of-the-art performance on open-ended traffic reasoning tasks.
LTD enables reasoning over heterogeneous roadside camera observations in urban environments.
Existing models show limitations in complex multi-view traffic scenarios.
Abstract
Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception and reasoning in intelligent transportation systems (ITS), existing research remains largely centered on microscopic autonomous driving (AD), with limited attention to city-scale traffic analysis. In particular, open-ended safety-oriented visual question answering (VQA) and corresponding foundation models for reasoning over heterogeneous roadside camera observations remain underexplored. To address this gap, we introduce the Land Transportation Dataset (LTD), a large-scale open-source vision-language dataset for open-ended reasoning in urban traffic environments. LTD contains 11.6K high-quality VQA pairs collected from heterogeneous roadside…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
