Vision-Based Natural Language Scene Understanding for Autonomous Driving: An Extended Dataset and a New Model for Traffic Scene Description Generation
Danial Sadrian Zadeh, Otman A. Basir, and Behzad Moshiri

TL;DR
This paper introduces a novel vision-based framework for generating natural language descriptions of traffic scenes in autonomous driving, utilizing a new dataset and a hybrid attention model for improved scene understanding.
Contribution
The paper presents a new dataset derived from BDD100K and a hybrid attention-based model for detailed traffic scene description generation.
Findings
The model achieves high scores on CIDEr and SPICE metrics.
The new dataset facilitates better training and evaluation.
Human judgments confirm the model's effectiveness.
Abstract
Traffic scene understanding is essential for enabling autonomous vehicles to accurately perceive and interpret their environment, thereby ensuring safe navigation. This paper presents a novel framework that transforms a single frontal-view camera image into a concise natural language description, effectively capturing spatial layouts, semantic relationships, and driving-relevant cues. The proposed model leverages a hybrid attention mechanism to enhance spatial and semantic feature extraction and integrates these features to generate contextually rich and detailed scene descriptions. To address the limited availability of specialized datasets in this domain, a new dataset derived from the BDD100K dataset has been developed, with comprehensive guidelines provided for its construction. Furthermore, the study offers an in-depth discussion of relevant evaluation metrics, identifying the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Autonomous Vehicle Technology and Safety
