Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

Xiao Lu; Hao Zhen; Jidong J. Yang

arXiv:2604.15332·cs.HC·April 20, 2026

Automating Crash Diagram Generation Using Vision-Language Models: A Case Study on Multi-Lane Roundabouts

Xiao Lu, Hao Zhen, Jidong J. Yang

PDF

TL;DR

This paper explores automating crash diagram creation from police reports using Vision-Language Models, demonstrating promising results with GPT-4o and highlighting current limitations in engineering visualization tasks.

Contribution

It introduces a structured prompt framework for VLMs to generate crash diagrams and evaluates model performance on multilane roundabout crash reports.

Findings

01

GPT-4o achieved the highest performance score of 6.29/10.

02

VLMs show promise but still face limitations in engineering visualization.

03

The study provides a foundation for integrating AI into crash analysis workflows.

Abstract

Crash diagrams are essential tools in transportation safety analysis, yet their manual preparation remains time-consuming and prone to human variability. This study investigates the use of Vision-Language Models (VLMs) to automate crash diagram generation from police crash reports, focusing on multilane roundabouts as a challenging test case. A three-part structured prompt framework was developed to guide model reasoning through interpretation, extraction, and visual synthesis, while a 10-metric evaluation system was designed to assess diagram quality in terms of semantic accuracy, spatial fidelity, and visual clarity. Three popular models, including GPT-4o, Gemini-1.5-Flash, and Janus-4o, were tested on 79 crash reports. GPT-4o achieved the highest average performance (6.29 out of 10), followed by Gemini-1.5-Flash (5.28) and Janus-4o (3.64). The analysis revealed GPT-4o's superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.