# Predicting errors in accident hotspots and investigating satiotemporal, weather, and behavioral factors using interpretable machine learning: An analysis of telematics big data

**Authors:** Ali Golestani, Nazila Rezaei, Mohammad-Reza Malekpour, Naser Ahmadi, Seyed Mohammad-Navid Ataei, Sepehr Khosravi, Ayyoob Jafari, Saeid Shahraz, Farshad Farzadfar, Habtamu Setegn Ngusie, Habtamu Setegn Ngusie, Habtamu Setegn Ngusie, Habtamu Setegn Ngusie

PMC · DOI: 10.1371/journal.pone.0326483 · PLOS One · 2025-07-08

## TL;DR

This study uses interpretable machine learning to identify factors contributing to road accidents in Iran, focusing on spatial and behavioral factors.

## Contribution

The novel use of interpretable ML models, specifically SHAP, to analyze telematics data and identify key predictors of road accident hotspots in Iran.

## Key findings

- XGBoost achieved the highest performance with an AUC of 91.70% in predicting road accident hotspots.
- Spatial variables like province and road type were the most important predictors of accident hotspots.
- Behavioral factors like fatigue and weather variables like dew point and humidity also significantly influenced predictions.

## Abstract

Road traffic accidents (RTAs) are a major public health concern with significant health and economic burdens. Identifying high-risk areas and key contributing factors is essential for developing targeted interventions. While machine learning (ML) has been increasingly used to predict RTAs, the lack of interpretability limits its applicability in policymaking. This study aimed to utilize interpretable ML models to predict the occurrence of errors in road accident hotspots using telematics data in Iran and interpret the most influential predictors.

We utilized data collected via telematics from 1673 intercity buses throughout the year 2020, spanning cities across all provinces of Iran. Merging this data with a weather-related dataset resulted in a comprehensive dataset containing location, time, weather, and error type variables. After preprocessing, 619,988 records without any missing values were used to train and compare the performance of six machine learning models including logistic regression, K-nearest neighbors, random forest, Extreme Gradient Boosting (XGBoost), Naïve Bayes, and support vector machine. The best model was selected for interpretation using SHAP (SHapley Additive exPlanation). Due to the high imbalance in the outcome, an ensemble approach was applied to train all models.

XGBoost demonstrated the best performance with an area under the curve (AUC) of 91.70% (95% uncertainty interval: 91.33% − 92.09%). SHAP values highlighted spatial-related variables, particularly the province of error and road type, as the most critical features for predicting errors in accident hotspots in Iran. Fatigue, as a behavioral error, was associated with a higher risk of predicting errors in accident hotspots, and certain weather-related variables including dew points and relative humidity also exhibited importance. However, temporal variables did not contribute significantly to the prediction.

By integrating spatiotemporal, behavioral, and weather-related variables, our study highlighted the dominance of spatial factors in predicting errors in accident hotspots. These findings underscore the need for targeted road infrastructure improvements and data-driven policymaking to mitigate RTA risks.

## Full-text entities

- **Diseases:** Fatigue (MESH:D005221), RTA (MESH:D000141), RTAs (MESH:D000081084)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12237018/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12237018/full.md

## References

76 references — full list in the complete paper: https://tomesphere.com/paper/PMC12237018/full.md

---
Source: https://tomesphere.com/paper/PMC12237018