Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts
Xiao Lu, Hao Zhen, Jidong J. Yang

TL;DR
This paper explores automating crash diagram creation from police reports using Vision-Language Models, demonstrating promising results with GPT-4o and highlighting current limitations in engineering visualization tasks.
Contribution
It introduces a structured prompt framework for VLMs to generate crash diagrams and evaluates model performance on multilane roundabout crash reports.
Findings
GPT-4o achieved the highest performance score of 6.29/10.
VLMs show promise but still face limitations in engineering visualization.
The study provides a foundation for integrating AI into crash analysis workflows.
Abstract
Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variability. This study investigates the use of Vision-Language Models (VLMs) to automate crash diagram generation from police crash reports, focusing on multilane roundabouts as a challenging test case. A three-part structured prompt framework was developed to guide model reasoning through interpretation, extraction, and visual synthesis, while a 10-metric evaluation system was designed to assess diagram quality in terms of semantic accuracy, spatial fidelity, and visual clarity. Three popular models, including GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports. GPT-4o achieved the highest average performance (6.29 out of 10), followed by Gemini-1.5-Flash (5.28) and Janus-4o (3.64). The analysis revealed GPT-4o's superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
