TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs
Md Adnan Arefeen, Biplob Debnath, Srimat Chakradhar

TL;DR
TrafficLens is a novel algorithm that efficiently converts multi-camera traffic videos into detailed text descriptions using VLMs and LLMs, significantly reducing processing time while maintaining accuracy.
Contribution
It introduces a sequential, overlap-aware approach with object-level redundancy detection to accelerate multi-camera traffic video analysis using LLMs.
Findings
Reduces video-to-text conversion time by up to 4x
Maintains high information accuracy in descriptions
Effective in real-world traffic datasets
Abstract
Traffic cameras are essential in urban areas, playing a crucial role in intelligent transportation systems. Multiple cameras at intersections enhance law enforcement capabilities, traffic management, and pedestrian safety. However, efficiently managing and analyzing multi-camera feeds poses challenges due to the vast amount of data. Analyzing such huge video data requires advanced analytical tools. While Large Language Models (LLMs) like ChatGPT, equipped with retrieval-augmented generation (RAG) systems, excel in text-based tasks, integrating them into traffic video analysis demands converting video data into text using a Vision-Language Model (VLM), which is time-consuming and delays the timely utilization of traffic videos for generating insights and investigating incidents. To address these challenges, we propose TrafficLens, a tailored algorithm for multi-camera traffic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Generative Adversarial Networks and Image Synthesis
