Prediction of Workplace Injuries

Mehdi Sadeqi; Azin Asgarian; Ariel Sibilia

arXiv:1906.03080·cs.CY·June 10, 2019

Prediction of Workplace Injuries

Mehdi Sadeqi, Azin Asgarian, Ariel Sibilia

PDF

Open Access

TL;DR

This paper presents a comprehensive approach to predicting workplace injuries by addressing data imbalance, transferring knowledge across organizations, and uncovering causal factors to improve injury prevention strategies.

Contribution

It introduces ensemble resampling methods, a novel transfer learning approach, and techniques for causal analysis in injury risk prediction, advancing the field significantly.

Findings

01

Ensemble resampling improves prediction accuracy on imbalanced datasets.

02

Transfer learning effectively generalizes injury risk models across organizations.

03

Causal analysis identifies key variables influencing injury risk.

Abstract

Workplace injuries result in substantial human and financial losses. As reported by the International Labour Organization (ILO), there are more than 374 million work-related injuries reported every year. In this study, we investigate the problem of injury risk prediction and prevention in a work environment. While injuries represent a significant number across all organizations, they are rare events within a single organization. Hence, collecting a sufficiently large dataset from a single organization is extremely difficult. In addition, the collected datasets are often highly imbalanced which increases the problem difficulty. Finally, risk predictions need to provide additional context for injuries to be prevented. We propose and evaluate the following for a complete solution: 1) several ensemble-based resampling methods to address the class imbalance issues, 2) a novel transfer…

Figures17

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Performance of different methods on Company-1’s data.

Method	Precision	Recall	$F_{1} -score$	AUCPR
$𝒜_{T}$	0.07	0.06	0.06	0.0375
$𝒜_{S}$	0.04	0.18	0.07	0.0405
$𝒜_{S \cup T}$	0.13	0.06	0.08	0.0478
$𝒜_{𝟏}$	0.06	0.12	0.08	0.0456
$𝒜_{G}$	0.07	0.16	0.10	0.0532
$𝒜_{H W}$	0.11	0.12	0.12	0.0542

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsOccupational Health and Safety Research · Traffic and Road Safety · Anomaly Detection Techniques and Applications

Full text

Prediction of Workplace Injuries

Mehdi Sadeqi

Azin Asgarian

Ariel Sibilia

Abstract

Workplace injuries result in substantial human and financial losses. As reported by the International Labour Organization (ILO), there are more than 374 million work-related injuries reported every year. In this study, we investigate the problem of injury risk prediction and prevention in a work environment. While injuries represent a significant number across all organizations, they are rare events within a single organization. Hence, collecting a sufficiently large dataset from a single organization is extremely difficult. In addition, the collected datasets are often highly imbalanced which increases the problem difficulty. Finally, risk predictions need to provide additional context for injuries to be prevented. We propose and evaluate the following for a complete solution: 1) several ensemble-based resampling methods to address the class imbalance issues, 2) a novel transfer learning approach to transfer the knowledge across organizations, and 3) various techniques to uncover the association and causal effect of different variables on injury risk, while controlling for relevant confounding factors.

Machine Learning, Causal Inference, Cost-Sensitive Learning, Transfer Learning, Imbalanced Data, Cost-Curves, Actionable Insights, Partial Dependence Plot

1 Introduction

Workplace injuries can affect workers’ lives and can cause substantial economic burden to employees, employers, and more generally to society (ILO, 2018; Sarkar et al., 2016). There are more than $374$ million work-related injuries reported every year, resulting in more than $2.3$ million deaths annually (ILO, 2018). The yearly cost to the global economy from work-related injuries alone is a staggering $3 trillion, estimated by ILO.

Predicting injuries and providing actionable insights on factors associated with injuries are critical for improving workplace safety. Recent research has focused on this problem in sports (Naglah et al., 2018; Rossi et al., 2018), construction (Tixier et al., 2016; Poh et al., 2018), and various workplace settings (Sánchez et al., 2011; Rivas et al., 2011; Sarkar et al., 2016). Despite introducing many interesting frameworks, these studies do not address some of the main challenges such as lack of labeled data and class imbalance issues. In addition, previous works do not investigate the causal relationships between different variables and injury incidents.

We propose a framework that employs ensemble-based resampling methods and a novel transfer learning approach to address class imbalance and data availability issues. We apply a method to predictive features of injuries to highlight their direct causal effect and we utilize a visualization technique that provides interpretability. We demonstrate the utility of our framework through experiments performed on real-world datasets. More specifically, we show that ensemble-based resampling and transfer learning techniques can increase the $\text{F}_{1}\text{-score}$ by 100% and area under precision recall curve by 44%, when compared to a model trained on a single organization dataset.

In the remainder of this paper, we first provide a brief overview of the problem and our machine learning framework in Section 2.1. We present the employed ensemble-based resampling techniques, our instance-based transfer learning method, and the approaches used to provide actionable insights in Sections 2.2, 2.3, and 2.4, respectively. Section 3 describes our results and Section 4 covers conclusions and future work.

2 Injury Prediction as Supervised Learning

2.1 Data and Problem Description

To conduct this study, we collected employees’ safety-related information from different organizations during years 2016-2017. We treat the learning problem as a binary classification task. Using the data collected during 2016, the objective is to predict whether an employee was injured or not in 2017. The collected datasets differ in size and distribution, however, they are all highly imbalanced (1-7% injury cases). In all datasets, the employee records are represented by 38 engineered features that capture two main groups of information: general employee information (e.g. age), and event-based information. Event-based information are either associated with the employee (e.g. number of absences) or with the employee’s site (e.g. the risk assessments scores).111The names of the organizations are masked due to confidentiality reasons. In this work, we use XGBoost (Friedman, 2001b) as our base predictive model.

2.2 Imbalanced Data

To address the problem of highly imbalanced data, several approaches are proposed in the literature. Among the most common ones are over-sampling and under-sampling methods (Chawla, 2003), neighbor-based techniques (Wilson, 1972; Tomek, 1976), Synthetic Minority Over-sampling TEchnique (SMOTE) (Chawla et al., 2002), adjusting class weights, boosting techniques, and anomaly detection methods. From these solutions, we are particularly interested in four methods that combine ensemble-based supervised learning algorithms with resampling methods (UnderBagging, SMOTEBagging, RUSBoost, and SMOTEBoost). We give a brief overview of these methods below.

UnderBagging and SMOTEBagging methods try to rebalance the class distribution in each bag of the bagging algorithms. UnderBagging uses random under-sampling while SMOTEBagging uses SMOTE or over-sampling to achieve this goal (Galar et al., 2012). Alternatively, RUSBoost (Seiffert et al., 2010) and SMOTEBoost (Chawla et al., 2003) combine AdaBoost.M2 (Freund & Schapire, 1997) boosting algorithms with resampling methods to address the class imbalance issues. Similar to UnderBagging and SMOTEBagging, in each iteration of training weak learners, these two algorithms respectively use random under-sampling (to reduce majority instances) and SMOTE (to increase minority instances). Moreover, these four approaches have the advantage of having very few number of hyper-parameters. We provide a comparison of these methods in Section 3.1.

2.3 Transfer Learning

To handle the data unavailability issues for a new organization (target domain), we leverage the knowledge learned from other organizations (source domain) by employing an instance-based transfer learning method. Given a set of target training samples

$\left\{(x_{i},y_{i})|i\in\left\{1,2,...,N_{T}\right\}\right\}$

and a loss function

$\mathcal{L}(.)$

the goal of supervised learning is to find model

$\mathcal{A}^{*}$

that minimizes the expected error, i.e.,

$\mathcal{A}^{*}=\underset{\mathcal{A}\in\mathbb{A}}{\operatorname*{arg\,min}}\mathbb{E}_{x\sim P_{T}}\big{[}\mathcal{L}(\mathcal{A}(x),y)\big{]}$ .

Here $x$ is an arbitrary sample and

$P_{T}$

is the probability distribution of target samples. Following the idea of importance sampling (Liu, 2008; Asgarian et al., 2018) for transferring the knowledge from source domain (

$S$

) to target domain (

$T$

), we can express the expected error as

$\alpha\mathbb{E}_{x\sim P_{T}}\Big{[}\epsilon(x)\Big{]}+(1-\alpha)\mathbb{E}_{x\sim P_{S}}[\epsilon(x)\tfrac{P_{T}(x)}{P_{S}(x)}]$

. Here

$\epsilon(x)$

shows the error for each sample and $\alpha$ is a hyper-parameter that controls the overall relative importance between source and target samples. Source sample weights

$\{w_{x_{j}}=\frac{P_{T}(x_{j})}{P_{S}(x_{j})}\mid j\in\{1,...,N_{S}\}\}$

play a major role in instance-based transfer learning methods, as they control the individual effect of source samples (Asgarian et al., 2017). We describe different weighting approaches including our five baselines models and our proposed weighting strategy in the following.

Baselines: Models

$\mathcal{A}_{S}$

,

$\mathcal{A}_{T}$

, and

$\mathcal{A}_{S\cup T}$

trained respectively on source, target and the union of source and target, serve as minimum baselines that a transfer learning method must outperform. Our fourth baseline model is an instance-weighted model (

$\mathcal{A}_{\mathbf{1}}$

) with all the weights set to $\mathbf{1}$ (i.e.,

$\mathbf{W}_{S}=\mathds{1}$

). This is similar to

$\mathcal{A}_{S\cup T}$

, except in this model we use $\alpha$ to determine the relative overall importance between source and target samples. Our last baseline model

( $\mathcal{A}_{G}$ )

, assumes Gaussian distributions for target and source samples to evaluate the source sample weights $w_{x_{j}}$ .

Hybrid Weights: Previous methods evaluate the source sample weights solely based on their similarity to the target domain. We argue that it is also important to measure the relevance of source samples to the target task. Hence, we define weights $w_{x}=w_{domain_{x}}+w_{task_{x}}$ , where $w_{domain_{x}}$ measures the similarity of an arbitrary source sample $x$ to the target domain, while $w_{task_{x}}$ measures the importance of sample $x$ in the target task.

For evaluating $w_{domain_{x}}$ , unlike the previous methods that employ generative approaches to estimate

$P_{T}$

and

$P_{S}$

, we directly approximate weights

$w_{domain_{x}}=\frac{P_{T}(x)}{P_{S}(x)}$

with a discriminative classifier. More specifically, using

$\{(x,l_{x})\ |x\in S\cup T\text{, and$ l_{x}=1 $if$ x\in S $,$ l_{x}=0 $otherwise}\}$

, we train a binary classifier (e.g., logistic regression (LR)) to differentiate source and target samples. Next, we use the learned weights of this classifier

( $w_{lr}$ and $c_{lr}$ )

to estimate source sample weights

$w_{domain_{x}}=\frac{P_{T}(x)}{P_{S}(x)}\approx\frac{1}{\exp(x^{T}w_{lr}+c_{lr})}$

.

To compute $w_{task_{x}}$ , we train an instance of our predictive model

(XGBoost)

using all samples from source and target with their corresponding injury labels. We then define $w_{task_{x}}$ to be the uncertainty of this model about sample $x$ . We define the uncertainty of a model about sample $x$ to be the distance of $x$ to the decision boundary. Note that this value could be negative (thus subtracting from $w_{x}$ ) when the decision is incorrect, or positive (thus adding to $w_{x}$ ) when the decision is correct. We denote this model as

$\mathcal{A}_{HW}$

.

2.4 Actionable Insights

To visualize the relationship between injuries and its predictors, we explore one method to show the straightforward association, and one to find causal relationship.222Please note that in order to infer meaningful and accurate causal relationships using this approach, we need adequately accurate models. The ensemble-based resampling methods and transfer learning are an attempt in this direction.

To measure the association between features and the target variable, we average the contribution of each feature’s possible values to the log-odds ratio of all samples which match that value. We bin every continuous variable, treating them as categorical, so that such matching is possible for all variables. We use the xgboost_explainer package in Python, which is inspired by (Foster, ), to find the average log-odds contribution of each feature to each sample. Next, for each discrete value of each feature, we average the sample-based contribution over all samples with matching values. This gives us a visualization of both the average impact and direction of each variable as seen in Figure 1. This is a unique approach for visualizing the association between safety-related variables and injuries and it also gives an interpretation of the model. However, it must be noted that each effect size measured here is an average over the population, not a deterministic effect.

Partial dependence plots (PDPs) (Friedman, 2001a) are an especially interesting and fairly natural way of adding interpretability to more complex models. PDPs show the average relationship between two variables over a population by marginalizing over the distribution of all other variables. For a trained model, this is approximated by summing over the training data, where, unlike the marginalized variables, the variables to be plotted are held constant (Friedman, 2001a). PDPs may also yield a causal interpretation of the effect of a variable on injuries (Zhao & Hastie, 2017). Zhao & Hastie showed that a partial dependence calculation that averages over a set of variables is equivalent to controlling for those variables using Pearl’s back-door adjustment formula (Pearl, 1993). For example, there are instances where the relationship between an input variable and injuries is reversed when doing a partial dependence calculation over another variable. In (Pearl, 2014), Pearl shows that instances of the Simpson’s paradox can be properly explained when using the back-door criterion to adjust for a variable.

In our example, we observe the effect of age on probability of injuries after adjusting for tenure. In our causal hypothesis, both age and tenure have a causal impact on injuries. However, neither are considered to directly cause each other. Nevertheless, they are strongly correlated, so we connect each of their exogenous variables in a causal graph. Intuitively, this can be described as both age and tenure being caused by the “passage of time”. In our causal path from age to injury there is one back-door path, which is blocked by tenure. Therefore, tenure satisfies the back-door criterion from age to injury. In Figure 2, we plot probability of injury versus age generated in three different ways. The direct prediction and loess (locally estimated scatterplot smoothing) plot both estimate the direct association between age and probability of injury. The partial dependence plot shows the same association, after adjusting for tenure.

3 Experiments

3.1 Imbalanced Data

For comparing the ensemble-based resampling methods to our benchmark XGBoost model, we adopted an effective visualization technique called cost-curves (Drummond & Holte, 2006). These curves allow us to evaluate a classifier in deployment conditions of two important factors–class distributions and misclassification costs–which are usually unknown or varying with time. Using cost-curves, we can visualize a classifier’s performance for the whole range of these unknown factors. Receiver operating characteristic (ROC) Curves are point/line dual with cost-curves and they convey the same information implicitly, but they are not visually as informative. Hence, we used cost-curves in addition to our other evaluation metrics. These curves also helps to find the conditions for which a classifier shows better performance compared to other classifiers and particularly to trivial classifiers (Drummond & Holte, 2006).

In all datasets, RUSBoost and UnderBagging showed a better performance than the XGBoost model in handling class imbalance (Figure 3). SMOTEBagging and SMOTEBoost, however, showed a lower performance compared to the XGBoost. To avoid a cluttered plot, they are not shown in Figure 3. That being said, this is a data-dependent behavior and one should test each of these algorithms to see which one best matches the data.

The cost-curve performance comparison and model selection is only needed if misclassification costs are unknown in advance or deployment class distribution is different from the test data class distribution. If this is not the case, we can find the optimum threshold that maximizes a profit function defined by a profit matrix. This threshold will be another hyper-parameter that should be optimized inside a cross-validation pipeline. In our XGBoost model, we used the profit function as the evaluation metric for early stopping for $100$ different thresholds. For simplicity, here we kept all other XGBoost parameters fixed and optimized only for threshold. Figure 4 shows the ratio of model profit to a benchmark profit as a function of threshold values. The profit matrix and the optimum threshold value $0.1$ are shown in this figure.

3.2 Transfer Learning

In our transfer learning framework, we considered Organization-1’s dataset as the target domain and Organization-2’s dataset as the source domain. For training, we used 58,271 samples from target (12,225) and source (46,046) training sets, and evaluated the models on 3,057 samples from target test set. Since the datasets were highly imbalanced (1-7 $\%$ injury cases), we used precision, recall, $\text{F}_{1}\text{-score }$ (macro), and area under the precision-recall curve (AUCPR) as our four evaluation metrics. Results of our quantitative evaluation is shown in Table 1.

In Table 1, we see that model $\mathcal{A}_{T}$ has a poor performance with $\text{F}_{1}\text{-score }$ equal to 0.06 and AUCPR of 0.0375. This is possibly due to data sparsity issue and lack of expressiveness of the model. On the other hand, $\mathcal{A}_{S}$ has a higher $\text{F}_{1}\text{-score }$ and AUCPR, but the precision is diminished. Also model $\mathcal{A}_{S\cup T}$ performs better in terms of $\text{F}_{1}\text{-score }$ and AUCPR compared to both models $\mathcal{A}_{T}$ and $\mathcal{A}_{S}$ . The best result is obtained with our model $\mathcal{A}_{HW}$ , which increases the $\text{F}_{1}\text{-score }$ and AUCPR considerably.

Figure 5 shows the AUCPR obtained with model $\mathcal{A}_{HW}$ as a function of hyper-parameter $\alpha$ . We see that the best performance is achieved with $\alpha$ equal to 0.7. However, increasing or decreasing $\alpha$ results in lower AUCPR, as it enhances the influence of target or source samples respectively.

4 Conclusions And Future Work

In this paper, we investigate the problem of injury risk prediction in a supervised learning framework. To improve the performance in presence of highly imbalanced data, we employ ensemble-based resampling techniques. To address the lack of labeled data, we propose an instance-based transfer learning method. Additionally, we provide actionable insights to prevent injuries and show the effectiveness of our framework experimentally. In our future work, we focus further on discovering the causal relationships from observational data as it is a key element in injury prevention.

Acknowledgment

We would like to acknowledge Parinaz Sobhani, Madalin Mihailescu, Chang Liu, and Diego Huang for their invaluable assistance and insightful comments on the initial draft of this work. We are specially grateful to our management team at Georgian Partners and Cority including Madalin Mihailescu, Ji Chao Zhang, Stan Marsden, and David Vuong for encouraging and supporting our research activities, without which we were unable to complete this work. Finally, we would like to thank Babak Taati for his direction and guidance on this study.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1ILO (2018) Safety and health at work, 2018. URL http://www.ilo.org/global/topics/safety-and-health-at-work/lang--en/index.htm .
2Asgarian et al. (2017) Asgarian, A., Ashraf, A. B., Fleet, D., and Taati, B. Subspace selection to suppress confounding source domain information in aam transfer learning. In IEEE IJCB , 2017.
3Asgarian et al. (2018) Asgarian, A., Sobhani, P., Zhang, J. C., Mihailescu, M., Sibilia, A., Ashraf, A. B., and Taati, B. A hybrid instance-based transfer learning method. ar Xiv:1812.01063 , 2018.
4Chawla (2003) Chawla, N. V. C 4. 5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In Proceedings of the ICML , 2003.
5Chawla et al. (2002) Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research , 2002.
6Chawla et al. (2003) Chawla, N. V., Lazarevic, A., Hall, L. O., and Bowyer, K. W. Smoteboost: Improving prediction of the minority class in boosting. In In Proceedings of the Principles of Knowledge Discovery in Databases (PKDD) , 2003.
7Drummond & Holte (2006) Drummond, C. and Holte, R. C. Cost curves: An improved method for visualizing classifier performance. Machine Learning , 2006.
8(8) Foster, D. xgboost Explainer: XG Boost Model Explainer . R package version 0.1.