# Multidisciplinary prediction of running-related injuries using machine learning

**Authors:** Han Wu, Katherine Brooke-Wavell, Michael R. Barnes, Zainab Awan, Sarabjit Mastana, Sam Allen, Richard C. Blagrove

PMC · DOI: 10.1038/s41746-026-02413-y · NPJ Digital Medicine · 2026-02-06

## TL;DR

This study uses machine learning and multidisciplinary data to predict running-related injuries in endurance athletes.

## Contribution

The paper introduces a machine learning-ready dataset and framework for predicting running-related injuries using diverse risk factors.

## Key findings

- Random forest achieved the best injury prediction performance with an AUC of 0.781–0.784.
- Logistic regression improved when using a broader range of risk factors.
- The dataset includes 6181 weekly samples from 142 endurance runners monitored for 12 months.

## Abstract

The causes of endurance running-related injury (RRI) are multifactorial, yet little research has been conducted which utilizes multidisciplinary risk factors for individualized RRI prediction. This paper presents a machine learning (ML)-ready RRI weekly prediction dataset using evidence-based multidisciplinary risk factors. Risk factors in genetic single-nucleotide polymorphisms, history, muscular strength, biomechanics, body composition, nutrition, and training were collected from competitive endurance runners (n = 142), who were prospectively monitored for 12 months for RRIs, accumulating 6181 weekly samples. ML models were fitted using (i) risk factors with high-level supporting evidence, and (ii) a broader range of risk factors to establish a performance baseline. Model performance (AUC = 0.784 ± 0.014) showed moderate improvement compared to previous RRI prediction modeling. Random forest achieved the best performance (AUC = 0.781 ± 0.016, 0.784 ± 0.014), which was significantly higher (q < 0.05) than most other algorithms. Only logistic regression achieved significantly improved (q < 0.05) performance when trained using a broader range of risk factors compared to a selection of high-quality risk factors. This study introduces a reproducible methodological framework for future ML sports injury prediction research and a valuable dataset for pooling in larger-scale analytics. Comparisons among different ML methods revealed nuanced insights into the interaction between data structure and model suitability.

## Full-text entities

- **Diseases:** RRI (MESH:D020195), sports injury (MESH:D001265), injuries (MESH:D014947)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987969/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987969/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987969/full.md

---
Source: https://tomesphere.com/paper/PMC12987969