# The Performance of Wearable Device–Based Artificial Intelligence in Detecting Depression: Systematic Review and Meta-Analysis

**Authors:** Jiawen Liu, Junhui Wang, Zhaobin Wu, Mohamad Ibrani Shahrimin Bin Adam Assim

PMC · DOI: 10.2196/85319 · 2026-03-10

## TL;DR

This study reviews how well wearable devices with AI can detect depression and predict depressive episodes, finding high accuracy but noting limitations in generalizability.

## Contribution

The study provides a systematic review and meta-analysis of wearable AI for depression detection, highlighting performance metrics and influencing factors.

## Key findings

- Wearable AI models achieved high pooled sensitivity (0.89) and specificity (0.93) for depression detection.
- Random forest models showed the best performance with an AUC of 0.97 for depression detection.
- Predictive accuracy for depressive episodes was moderate with pooled specificity of 0.65.

## Abstract

In recent years, advances in wearable sensor technology and artificial intelligence (AI) have provided new possibilities for detecting and monitoring depression.

This study systematically reviewed and meta-analyzed the diagnostic and predictive performance of wearable device–based AI models for detecting depression and predicting depressive episodes and explored factors influencing outcomes.

Following PRISMA-DTA (Preferred Reporting Items for a Systematic Review and Meta-Analysis of Diagnostic Test Accuracy) guidelines, the PubMed, Embase, Web of Science, and PsycINFO databases were searched from inception to May 27, 2025. Eligible studies used AI algorithms on wearable device data for depression detection or episode prediction. Sensitivity, specificity, diagnostic odds ratio, and area under the curve (AUC) were pooled using a bivariate random effects model. Risk of bias was assessed using Prediction Model Risk of Bias Assessment Tool plus artificial intelligence (PROBAST+ AI), and certainty of evidence was assessed using the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) tool.

We included 16 studies (32 datasets) with 1189 patients and 13,593 samples. For depression detection, pooled sensitivity and specificity were 0.89 (95% CI 0.83‐0.93) and 0.93 (95% CI 0.87‐0.96), with a diagnostic odds ratio of 110.47 (95% CI 33.33‐366.17) and AUC of 0.96 (95% CI 0.94‐0.98). Random forest models showed the best performance (sensitivity=0.89, specificity=0.91, AUC=0.97). Subgroup analyses indicated that study design, AI method, reference standard, and input type significantly affected diagnostic accuracy (P<.05). For depressive episode prediction (3 datasets), pooled sensitivity was 0.86 (95% CI 0.80‐0.91), and pooled specificity was 0.65 (95% CI 0.59‐0.71). The overall risk of bias was low to moderate, with no evidence of publication bias.

Wearable device–based AI models achieved high accuracy for detecting depression and moderate utility in predicting episodes. However, heterogeneity, reliance on retrospective and public datasets, and lack of standardized methods limited generalizability.

## Linked entities

- **Diseases:** depression (MONDO:0002050)

## Full-text entities

- **Diseases:** anxiety (MESH:D001007), Mental Disorders (MESH:D001523), anhedonia (MESH:D059445), sleep disturbances (MESH:D012893), muscle tension (MESH:D018781), autonomic dysfunction (MESH:D001342), DOR (MESH:C566076), disorders (MESH:D009358), major depression (MESH:D003865), mood disorders (MESH:D019964), AI (MESH:C538142), cognitive impairment (MESH:D003072), low (MESH:D009800), Depression (MESH:D003866)
- **Chemicals:** TP (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12974932/full.md

---
Source: https://tomesphere.com/paper/PMC12974932