Eyes on the Road: State-of-the-Art Video Question Answering Models   Assessment for Traffic Monitoring Tasks

Joseph Raj Vishal; Divesh Basina; Aarya Choudhary; Bharatesh; Chakravarthi

arXiv:2412.01132·cs.CV·December 3, 2024

Eyes on the Road: State-of-the-Art Video Question Answering Models Assessment for Traffic Monitoring Tasks

Joseph Raj Vishal, Divesh Basina, Aarya Choudhary, Bharatesh, Chakravarthi

PDF

Open Access 1 Repo

TL;DR

This paper evaluates current VideoQA models in traffic monitoring scenarios, highlighting their strengths in compositional reasoning and identifying key limitations in multi-object tracking and temporal coherence.

Contribution

It provides a comprehensive assessment of state-of-the-art VideoQA models on traffic data, revealing specific performance gaps and suggesting directions for future improvements.

Findings

01

VideoLLaMA-2 achieved 57% accuracy in traffic VideoQA tasks.

02

Models struggle with multi-object tracking and temporal reasoning.

03

Current architectures need enhancements for complex scene understanding.

Abstract

Recent advances in video question answering (VideoQA) offer promising applications, especially in traffic monitoring, where efficient video interpretation is critical. Within ITS, answering complex, real-time queries like "How many red cars passed in the last 10 minutes?" or "Was there an incident between 3:00 PM and 3:05 PM?" enhances situational awareness and decision-making. Despite progress in vision-language models, VideoQA remains challenging, especially in dynamic environments involving multiple objects and intricate spatiotemporal relationships. This study evaluates state-of-the-art VideoQA models using non-benchmark synthetic and real-world traffic sequences. The framework leverages GPT-4o to assess accuracy, relevance, and consistency across basic detection, temporal reasoning, and decomposition queries. VideoLLaMA-2 excelled with 57% accuracy, particularly in compositional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

joe-rabbit/videoqa_pilot_study
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling