Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring

Murat Arda Onsu; Poonam Lohan; Burak Kantarci; Aisha Syed; Matthew Andrews; Sean Kennedy

arXiv:2502.11304·cs.AI·January 23, 2026

Leveraging Multimodal-LLMs Assisted by Instance Segmentation for Intelligent Traffic Monitoring

Murat Arda Onsu, Poonam Lohan, Burak Kantarci, Aisha Syed, Matthew Andrews, Sean Kennedy

PDF

TL;DR

This paper presents a multimodal large language model-based traffic monitoring system that integrates instance segmentation to improve real-time vehicle and pedestrian analysis in urban simulations, achieving high accuracy.

Contribution

It introduces a novel combination of multimodal LLMs with instance segmentation for enhanced traffic scene understanding in simulated environments.

Findings

01

Achieves 84.3% accuracy in vehicle location recognition

02

Attains 76.4% accuracy in steering direction determination

03

Outperforms traditional traffic monitoring models

Abstract

A robust and efficient traffic monitoring system is essential for smart cities and Intelligent Transportation Systems (ITS), using sensors and cameras to track vehicle movements, optimize traffic flow, reduce congestion, enhance road safety, and enable real-time adaptive traffic control. Traffic monitoring models must comprehensively understand dynamic urban conditions and provide an intuitive user interface for effective management. This research leverages the LLaVA visual grounding multimodal large language model (LLM) for traffic monitoring tasks on the real-time Quanser Interactive Lab simulation platform, covering scenarios like intersections, congestion, and collisions. Cameras placed at multiple urban locations collect real-time images from the simulation, which are fed into the LLaVA model with queries for analysis. An instance segmentation model integrated into the cameras…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.