CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Rui Gan; Junyi Ma; Pei Li; Xingyou Yang; Kai Chen; Sikai Chen; Bin Ran

arXiv:2604.08457·cs.CV·April 13, 2026

CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen, Sikai Chen, Bin Ran

PDF

1 Repo

TL;DR

CrashSight is a large-scale, infrastructure-centric video benchmark designed to evaluate vision-language models on traffic crash understanding, emphasizing temporal and causal reasoning in safety-critical scenarios.

Contribution

It introduces a novel dataset with real-world crash videos, annotated with questions to assess scene understanding and reasoning, filling a gap in existing autonomous driving benchmarks.

Findings

01

Current VLMs perform poorly on temporal and causal reasoning in crash scenarios.

02

The dataset includes 13K questions across 250 crash videos, covering multiple reasoning levels.

03

Analysis reveals specific failure modes of state-of-the-art models in safety-critical contexts.

Abstract

Cooperative autonomous driving requires traffic scene understanding from both vehicle and infrastructure perspectives. While vision-language models (VLMs) show strong general reasoning capabilities, their performance in safety-critical traffic scenarios remains insufficiently evaluated due to the ego-vehicle focus of existing benchmarks. To bridge this gap, we present \textbf{CrashSight}, a large-scale vision-language benchmark for roadway crash understanding using real-world roadside camera data. The dataset comprises 250 crash videos, annotated with 13K multiple-choice question-answer pairs organized under a two-tier taxonomy. Tier 1 evaluates the visual grounding of scene context and involved parties, while Tier 2 probes higher-level reasoning, including crash mechanics, causal attribution, temporal progression, and post-crash outcomes. We benchmark 8 state-of-the-art VLMs and show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcgrche/crashsight
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.