Overcoming Imbalanced Safety Data Using Extended Accident Triangle
Kailai Sun, Tianxiang Lan, Yang Miang Goh, and Yueng-Hsiang Huang

TL;DR
This paper addresses the challenge of imbalanced safety datasets in workplace incident prediction by extending accident triangle theory and proposing weighted oversampling methods, leading to improved machine learning performance.
Contribution
It introduces a novel approach to handle imbalanced safety data by weighting samples based on accident characteristics, and shares new datasets and code for further research.
Findings
Robust improvements across multiple machine learning algorithms
Effective oversampling methods based on accident characteristics
Provision of open-source imbalanced safety datasets
Abstract
There is growing interest in using safety analytics and machine learning to support the prevention of workplace incidents, especially in high-risk industries like construction and trucking. Although existing safety analytics studies have made remarkable progress, they suffer from imbalanced datasets, a common problem in safety analytics, resulting in prediction inaccuracies. This can lead to management problems, e.g., incorrect resource allocation and improper interventions. To overcome the imbalanced data problem, we extend the theory of accident triangle to claim that the importance of data samples should be based on characteristics such as injury severity, accident frequency, and accident type. Thus, three oversampling methods are proposed based on assigning different weights to samples in the minority class. We find robust improvements among different machine learning algorithms.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
