# Deep Learning for Freezing of Gait Detection: Cross-Dataset Validation Reveals Critical Deployment Gaps Between Laboratory and Daily Living Wearable Monitoring

**Authors:** Wei Lin, Sanjeet S. Grewal

PMC · DOI: 10.3390/s26041352 · 2026-02-20

## TL;DR

This study shows that algorithms for detecting freezing of gait in Parkinson's patients work well in labs but fail in real-world settings, highlighting a major gap in wearable monitoring systems.

## Contribution

The study introduces a framework for evaluating real-world deployment readiness of FoG detection systems and identifies a critical performance gap between lab and daily living settings.

## Key findings

- Cross-dataset validation revealed an 83% performance gap between laboratory and daily living FoG detection.
- F1-based early stopping outperformed AUC-based stopping by 47% in handling class imbalance.
- Combining multiple imbalance corrections paradoxically degraded precision to 33% due to over-weighting the minority class.

## Abstract

Freezing of gait (FoG) affects 38–65% of advanced Parkinson’s disease patients, yet automated detection algorithms are often validated solely on laboratory datasets. This study quantifies the critical performance gap between laboratory and real-world performance—a prerequisite for clinical deployment. Using temporal convolutional networks (TCNs), we trained models on two public datasets representing ecological extremes: a daily living dataset (Figshare; n = 35, single-sensor) and a laboratory dataset (DAPHNET; n = 10, multi-sensor). We compared five training configurations to address class imbalance. Results showed that F1-based early stopping outperformed Area Under the Curve (AUC)-based stopping by 47% (F1: 0.55 vs. 0.37, p = 0.0008). Combining multiple imbalance corrections (focal loss, weighting, sampling) paradoxically degraded precision to 33% due to a ~60-fold over-weighting of the minority class. Most importantly, cross-dataset validation revealed an 83% performance gap: laboratory F1 reached 0.9999 ± 0.0002, whereas daily living F1 dropped to 0.55 ± 0.26 (p < 0.0001), with a 1299-fold increase in variance. These findings demonstrate that laboratory success does not guarantee real-world utility. We propose that the observed gap represents a “deployment gap” reflecting the combined influence of environmental complexity, sensor constraints, and physiological variability. These results provide an empirical framework for evaluating deployment readiness of wearable FoG detection systems and offer concrete training strategy recommendations for clinical translation.

## Linked entities

- **Diseases:** Parkinson’s disease (MONDO:0005180)

## Full-text entities

- **Diseases:** cognitive impairment (MESH:D003072), FoG (MESH:D020234), falls (MESH:C537863), fatigue (MESH:D005221), injuries (MESH:D014947), Parkinson's disease (MESH:D010300)
- **Chemicals:** Dopaminergic (MESH:D004298), FoG (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** A through E

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12944384/full.md

---
Source: https://tomesphere.com/paper/PMC12944384