# A Selective RAG-Enhanced Hybrid ML-LLM Framework for Efficient and Explainable Fatigue Prediction Using Wearable Sensor Data

**Authors:** Soonho Ha, Taeyoung Lee, Hyungjun Seo, Sujung Yoon, Hwamin Lee

PMC · DOI: 10.3390/bioengineering13010058 · Bioengineering · 2026-01-03

## TL;DR

A new system combines machine learning and AI to better predict fatigue in high-stress jobs using wearable data, making predictions more accurate and understandable.

## Contribution

A selective RAG-enhanced hybrid ML-LLM framework is introduced for improved fatigue prediction with better interpretability and performance.

## Key findings

- The hybrid framework improved performance metrics in the uncertainty region (e.g., F1 increased from 0.659 to 0.725).
- On the full test set, accuracy improved from 0.707 to 0.718 with higher precision and recall.
- SHAP and LLM analyses identified sleep duration and heart-rate variability as key predictors of fatigue.

## Abstract

Fatigue is a multifactorial phenomenon affecting both physical and psychological performance, particularly in high-stress occupations. Although wearable sensors enable continuous monitoring, conventional machine-learning (ML) models can produce unstable, weakly calibrated, and opaque predictions in real-world settings. To improve reliability and interpretability, we developed a selective Retrieval-Augmented Generation (RAG)–enhanced hybrid ML–LLM framework that integrates the efficiency of ML with the reasoning capability of large language models (LLMs). Using wearable and ecological momentary assessment data from 297 emergency responders (9543 seven-day windows), logistic regression, XGBoost, and LSTM models were trained to classify fatigue levels dichotomized by the median of daily tiredness scores. The LLM was selectively activated only for borderline ML outputs (0.45 ≤ p ≤ 0.55), using symbolic rules and retrieved analog examples. In the uncertainty region, performance improved from 0.556/0.684/0.635/0.659 to 0.617/0.703/0.748/0.725 (accuracy/precision/recall/F1). On the full test set, performance similarly improved from 0.707/0.739/0.918/0.819 to 0.718/0.741/0.937/0.827, with gains confirmed by McNemar’s paired comparison test (p < 0.05). SHAP-based ML interpretation and LLM reasoning analyses independently identified short-term sleep duration and heart-rate variability as dominant predictors, providing transparent explanations for model behavior. This framework enhances classification robustness, interpretability, and efficiency, offering a scalable solution for real-world fatigue monitoring.

## Full-text entities

- **Diseases:** Fatigue (MESH:D005221)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12838294/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12838294/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/PMC12838294/full.md

---
Source: https://tomesphere.com/paper/PMC12838294