# From Prediction to Prevention: Using Text Mining and Explainable Machine Learning for Urban Bus Accident Analytics

**Authors:** Bowei Chen, Yufei Huang, Yu Zheng, Xiaofeng Liu

PMC · DOI: 10.1111/risa.70183 · Risk Analysis · 2026-01-30

## TL;DR

This study uses machine learning and text analysis to understand and predict severe urban bus accidents, offering insights for improving safety in transportation.

## Contribution

The study introduces a novel framework combining text mining, predictive modeling, and explainable AI for urban bus accident analysis.

## Key findings

- Incorporating text-derived patterns improves predictive accuracy and interpretability of accident severity.
- Rear-end collisions with electric scooters and left-turn maneuvers in congested areas are identified as high-risk factors.
- SHAP explanations provide actionable insights for targeted safety interventions in urban transit.

## Abstract

Urban bus accidents present major safety and operational challenges, particularly in densely populated metropolitan areas. This study develops a machine learning‐based analytical framework to identify, quantify, and interpret the factors associated with severe bus accidents. The framework integrates three components: (i) a structural topic model (STM) to extract latent accident scenarios from unstructured narrative data, (ii) an extreme gradient boosting (XGBoost) classifier to predict accident severity, and (iii) SHapley Additive exPlanations (SHAP) for post hoc interpretation of model outputs at both global and local levels. Using over 15,000 bus accident records (2013–2018) from a Tier‐2 city in Jiangsu Province, China, the findings show that incorporating text‐derived accident patterns markedly improves both predictive accuracy and interpretability. The analysis highlights elevated risks linked to rear‐end collisions involving electric scooters, sudden stops leading to passenger injuries, and left‐turn maneuvers in congested areas. SHAP‐based explanations yield actionable insights for drivers, transit operators, and policymakers, facilitating targeted safety interventions. Methodologically, this study advances interpretable risk modeling through the integration of structured and unstructured data, and the modular analytical framework provides a transferable foundation for applications across diverse domains of transportation and risk analysis.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12857609/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12857609/full.md

## References

99 references — full list in the complete paper: https://tomesphere.com/paper/PMC12857609/full.md

---
Source: https://tomesphere.com/paper/PMC12857609