Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding
Jingtian Ma, Jingyuan Wang, Wayne Xin Zhao, Guoping Liu, Xiang Wen

TL;DR
This paper introduces a novel spatio-temporal enhanced vision-language model, ST-CLIP, for traffic scene understanding, effectively integrating spatio-temporal data with visual-textual analysis to improve scene comprehension.
Contribution
The paper presents the first integration of spatio-temporal information into vision-language models for traffic scene understanding, using a novel prompt learning approach.
Findings
Superior performance on real-world datasets
Effective in complex scene understanding scenarios
Works well with few-shot learning strategies
Abstract
Nowadays, navigation and ride-sharing apps have collected numerous images with spatio-temporal data. A core technology for analyzing such images, associated with spatiotemporal information, is Traffic Scene Understanding (TSU), which aims to provide a comprehensive description of the traffic scene. Unlike traditional spatio-temporal data analysis tasks, the dependence on both spatio-temporal and visual-textual data introduces distinct challenges to TSU task. However, recent research often treats TSU as a common image understanding task, ignoring the spatio-temporal information and overlooking the interrelations between different aspects of the traffic scene. To address these issues, we propose a novel SpatioTemporal Enhanced Model based on CILP (ST-CLIP) for TSU. Our model uses the classic vision-language model, CLIP, as the backbone, and designs a Spatio-temporal Context Aware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
