Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring
Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy

TL;DR
This paper presents a multimodal large language model-based traffic monitoring system that integrates instance segmentation to improve real-time vehicle and pedestrian analysis in urban simulations, achieving high accuracy.
Contribution
It introduces a novel combination of multimodal LLMs with instance segmentation for enhanced traffic scene understanding in simulated environments.
Findings
Achieves 84.3% accuracy in vehicle location recognition
Attains 76.4% accuracy in steering direction determination
Outperforms traditional traffic monitoring models
Abstract
A robust and efficient traffic monitoring system is essential for smart cities and Intelligent Transportation Systems (ITS), using sensors and cameras to track vehicle movements, optimize traffic flow, reduce congestion, enhance road safety, and enable real-time adaptive traffic control. Traffic monitoring models must comprehensively understand dynamic urban conditions and provide an intuitive user interface for effective management. This research leverages the LLaVA visual grounding multimodal large language model (LLM) for traffic monitoring tasks on the real-time Quanser Interactive Lab simulation platform, covering scenarios like intersections, congestion, and collisions. Cameras placed at multiple urban locations collect real-time images from the simulation, which are fed into the LLaVA model with queries for analysis. An instance segmentation model integrated into the cameras…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
