Reading Between the Lanes: Text VideoQA on the Road

George Tom; Minesh Mathew; Sergi Garcia; Dimosthenis Karatzas; C.V. Jawahar

arXiv:2307.03948·cs.CV·June 17, 2025

Reading Between the Lanes: Text VideoQA on the Road

George Tom, Minesh Mathew, Sergi Garcia, Dimosthenis Karatzas, C.V. Jawahar

PDF

Open Access 1 Repo

TL;DR

This paper introduces RoadTextVQA, a new dataset of driving videos with questions about road signs and text, aiming to improve video question answering for driver assistance systems.

Contribution

The paper presents RoadTextVQA, a novel dataset for VideoQA focused on road sign recognition in driving videos, facilitating research in in-vehicle support and multimodal reasoning.

Findings

01

State-of-the-art models perform poorly on the dataset

02

The dataset contains 3,222 videos and 10,500 questions

03

Highlighting the need for improved VideoQA methods for driving scenarios

Abstract

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of $3, 222$ driving videos collected from multiple countries, annotated with $10, 500$ questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

georg3tom/RoadTextVQA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques