# Machine learning prediction of long-term sickness absence due to mental disorders using Brief Job Stress Questionnaire data

**Authors:** Shinichi Iwasaki, Yasuhiko Deguchi, Shohei Okura, Kunio Maekubo, Ayaka Matsunaga, Koki Inoue

PMC · DOI: 10.1038/s41598-025-32857-3 · Scientific Reports · 2025-12-16

## TL;DR

This study uses machine learning to predict long-term sickness absence due to mental disorders in Japanese public servants using job stress data.

## Contribution

The study evaluates the effectiveness of various machine learning models and sampling methods for predicting mental disorder-related long-term sickness absence.

## Key findings

- Gradient boosted trees with bootstrap oversampling achieved the highest average precision and ROC-AUC for predicting LTSA-MD.
- No significant difference was found between top-performing machine learning and sampling method combinations.
- Predictive ability for LTSA-MD remains low, indicating a need for further research.

## Abstract

Long-term sickness absence (LTSA) is a significant issue, causing productivity decline, financial difficulties, and increased mental health issues, with mental disorders being the most common cause. Occupational stressors are also linked to increased risk of LTSA due to mental disorders (LTSA-MD). This study uses occupational stressors data, assessed using the Brief Job Stress Questionnaire from 2011 to 2022, to predict LTSA-MD using machine learning and sampling methods, assessing their performance. This study analyzes data from 231,425 Japanese public servants from 2011 to 2022, focusing on LTSA-MD incidents. We compared five machine learning models and six sampling methods (random sampling, equal size sampling, SMOTE-synthetic minority oversampling technique, bootstrapping, borderline-SMOTE and ADASYN-adaptive synthetic sampling) to predict LTSA-MD incidents, addressing class imbalance. We prioritized average precision (AP) to identify the most promising model–sampling combinations to give the severe class imbalance. The gradient boosted trees model and bootstrap oversampling method demonstrated highest AP among all integrations of machine learning and sampling methods, with a AP of 0.040 and a ROC-AUC of 0.81. However, no significant difference in superiority was observed between the combinations of higher-level AP machine learning and sampling methods. The results demonstrate that machine learning models’ predictive ability for LTSA-MD is generally low, requiring further research.

The online version contains supplementary material available at 10.1038/s41598-025-32857-3.

## Full-text entities

- **Diseases:** mental disorders (MESH:D001523)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12830388/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12830388/full.md

## References

7 references — full list in the complete paper: https://tomesphere.com/paper/PMC12830388/full.md

---
Source: https://tomesphere.com/paper/PMC12830388