Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection

Andrea Castellani; Zacharias Papadovasilakis; Giorgos Papoutsoglou; Mary Cole; Brian Bautsch; Tobias Rodemann; Ioannis Tsamardinos; Angela Harden

arXiv:2508.11504·cs.LG·August 18, 2025

Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection

Andrea Castellani, Zacharias Papadovasilakis, Giorgos Papoutsoglou, Mary Cole, Brian Bautsch, Tobias Rodemann, Ioannis Tsamardinos, Angela Harden

PDF

TL;DR

This study develops an explainable machine learning framework using a large Ohio crash dataset to predict crash severity and identify key risk factors, supporting data-driven traffic safety policies.

Contribution

It introduces a transparent AutoML and explainability approach for crash severity prediction, highlighting influential features and offering a scalable, interpretable methodology.

Findings

01

Final model achieved 85.6% AUC-ROC on training data.

02

Identified 17 key predictive features across multiple categories.

03

Environmental and contextual factors were more influential than impairment.

Abstract

Motor vehicle crashes remain a leading cause of injury and death worldwide, necessitating data-driven approaches to understand and mitigate crash severity. This study introduces a curated dataset of more than 3 million people involved in accidents in Ohio over six years (2017-2022), aggregated to more than 2.3 million vehicle-level records for predictive analysis. The primary contribution is a transparent and reproducible methodology that combines Automated Machine Learning (AutoML) and explainable artificial intelligence (AI) to identify and interpret key risk factors associated with severe crashes. Using the JADBio AutoML platform, predictive models were constructed to distinguish between severe and non-severe crash outcomes. The models underwent rigorous feature selection across stratified training subsets, and their outputs were interpreted using SHapley Additive exPlanations (SHAP)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.