SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding

Zihao Sheng; Zilin Huang; Yansong Qu; Jiancong Chen; Yuhao Luo; Yen-Jung Chen; Yue Leng; Sikai Chen

arXiv:2508.06763·cs.CV·April 3, 2026

SafePLUG: Empowering Multimodal LLMs with Pixel-Level Insight and Temporal Grounding for Traffic Accident Understanding

Zihao Sheng, Zilin Huang, Yansong Qu, Jiancong Chen, Yuhao Luo, Yen-Jung Chen, Yue Leng, Sikai Chen

PDF

1 Repo

TL;DR

SafePLUG is a framework that enhances multimodal large language models with pixel-level and temporal understanding for detailed traffic accident analysis, improving safety and scene comprehension.

Contribution

It introduces a novel multimodal model with pixel-level and temporal grounding capabilities, along with a new dataset for traffic accident understanding.

Findings

01

SafePLUG outperforms existing models on region-based question answering and segmentation tasks.

02

It effectively localizes temporal events and understands complex accident scenarios.

03

The framework advances fine-grained traffic scene analysis for safety applications.

Abstract

Multimodal large language models (MLLMs) have achieved remarkable progress across a range of vision-language tasks and demonstrate strong potential for traffic accident understanding. However, existing MLLMs in this domain primarily focus on coarse-grained image-level or video-level comprehension and often struggle to handle fine-grained visual details or localized scene components, limiting their applicability in complex accident scenarios. To address these limitations, we propose SafePLUG, a novel framework that empowers MLLMs with both Pixel-Level Understanding and temporal Grounding for comprehensive traffic accident analysis. SafePLUG supports both arbitrary-shaped visual prompts for region-aware question answering and pixel-level segmentation based on language instructions, while also enabling the recognition of temporally anchored events in traffic accident scenarios. To advance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://zihaosheng.github.io/SafePLUG
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.