# AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

**Authors:** Henrietta Baker, Matthew R. Hallowell, Antoine J.-P. Tixier

arXiv: 1908.05972 · 2026-05-21

## TL;DR

This study validates an NLP and machine learning approach to predict construction safety outcomes from attributes, using a large dataset and improved methods, achieving accurate injury severity predictions.

## Contribution

The paper introduces a validated, improved machine learning framework for predicting safety outcomes from attributes, with larger data, new models, and better evaluation methods.

## Key findings

- Attributes are highly predictive of safety outcomes.
- Injury severity is now well predicted, unlike in previous work.
- Model stacking and larger datasets enhance prediction accuracy.

## Abstract

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.05972/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/1908.05972/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1908.05972/full.md

---
Source: https://tomesphere.com/paper/1908.05972