# Scalable depression monitoring with smartphone speech using a multimodal benchmark and topic analysis

**Authors:** Daniel Emden, Maike Richter, Astrid Chevance, Ramona Leenings, Julian Herpertz, Lara Gutfleisch, Anna Fleuchaus, Rogério Blitz, Vincent L. Holstein, Janik Goltermann, Nils R. Winter, Jennifer Spanagel, Susanne Meinert, Tiana Borgers, Kira Flinkenflügel, Frederike Stein, Nina Alexander, Hamidreza Jamalabadi, Jonathan Repple, Christian Dobel, Elisabeth J. Leehr, Ronny Redlich, Ulrich W. Ebner-Priemer, Igor Nenadić, Tilo Kircher, Udo Dannlowski, Tim Hahn, Nils Opel

PMC · DOI: 10.1038/s41746-026-02486-9 · NPJ Digital Medicine · 2026-02-28

## TL;DR

This study explores using smartphone speech data and AI models to monitor depression severity, finding that language models paired with topic analysis can effectively track symptoms.

## Contribution

The paper introduces a scalable method for depression monitoring using speech and topic modeling, validated with a large dataset.

## Key findings

- Sentence-embedding models outperformed acoustic and lexical baselines in predicting depression scores.
- BERTopic identified six themes in speech, with 'Distress & care' showing highest depression scores.
- LLM embeddings with topic analysis offer a scalable route for ecologically valid depression monitoring.

## Abstract

Objective, scalable biomarkers are needed for continuous monitoring of major depressive disorder. Smartphone-collected speech is promising, yet clinically useful signals remain elusive. We analyzed 3151 weekly voice diaries from 284 German-speaking adults (128 MDD, 156 controls) to predict Beck Depression Inventory (BDI) scores. Sentence-embedding models outperformed lexical and acoustic baselines: Qwen3-8B achieved MAE 4.65 and R2 0.34, and stacked generalization of multilingual-E5 with Qwen3-8B further improved performance (MAE 4.37, R2 0.41). Audio embeddings added little incremental value. In an MDD-only analysis, multilingual-E5 was the top single modality (MAE 6.74, R2 0.20). To aid interpretation, BERTopic uncovered six coherent themes; BDI scores were highest for “Distress & care”, supporting clinical face validity. Together, LLM embeddings paired with lightweight topic analysis capture the dominant signal of depression severity in everyday speech and offer a scalable route to ecologically valid digital phenotyping.

## Linked entities

- **Diseases:** major depressive disorder (MONDO:0002009), depression (MONDO:0002050)

## Full-text entities

- **Diseases:** MDD (MESH:D003865), Distress (MESH:D012128), Depression (MESH:D003866), major (MESH:D004830)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996298/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996298/full.md

## References

12 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996298/full.md

---
Source: https://tomesphere.com/paper/PMC12996298