Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

Ziniu Zhang; Minxuan Duan; Haris N. Koutsopoulos; Hongyang R. Zhang

arXiv:2512.02920·cs.LG·May 15, 2026

Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation

Ziniu Zhang, Minxuan Duan, Haris N. Koutsopoulos, Hongyang R. Zhang

PDF

TL;DR

This paper develops a multimodal dataset combining road network data and satellite images to improve traffic accident prediction and causal analysis, demonstrating significant accuracy gains and environmental impact insights.

Contribution

It introduces a large, annotated multimodal dataset and evaluates methods that integrate visual and network data for accident prediction and causal inference.

Findings

01

Integrating visual and network data improves accident prediction accuracy by 3.7%.

02

Satellite imagery features are crucial for accurate accident prediction.

03

Higher precipitation, speed, and seasonal factors significantly increase accident rates.

Abstract

We consider analyzing traffic accident patterns using both road network data and satellite images aligned to road graph nodes. Previous work for predicting accident occurrences relies primarily on road network structural features while overlooking physical and environmental information from the road surface and its surroundings. In this work, we construct a large multimodal dataset spanning six U.S. states, containing nine million traffic accident records from official sources, and one million high-resolution satellite images for each node of the road network. Additionally, every node is annotated with features such as the region's weather statistics and road type (e.g., residential vs. motorway), and each edge is annotated with traffic volume information (i.e., Average Annual Daily Traffic). Utilizing this dataset, we conduct a comprehensive evaluation of multimodal learning methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.