# Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments

**Authors:** Xu Zhang, Xiangcheng Zhang, Weisi Chen, Chenlong Li, Chengyuan Yu

PMC · DOI: 10.1038/s41598-024-60278-1 · Scientific Reports · 2024-04-25

## TL;DR

This paper presents a new method for detecting depression in speech using transfer learning and wav2vec 2.0, achieving high accuracy even with limited data.

## Contribution

The novel approach combines wav2vec 2.0 with 1D-CNN, attention pooling, LSTM, and self-attention for improved depression detection in low-resource settings.

## Key findings

- The model achieved an F1 score of 79% on the DAIC-WOZ dataset.
- It reached 90.53% F1 score on the CMDC dataset, outperforming recent baselines.
- The method effectively captures temporal relationships in audio for depression detection.

## Abstract

Depression, a pervasive global mental disorder, profoundly impacts daily lives. Despite numerous deep learning studies focused on depression detection through speech analysis, the shortage of annotated bulk samples hampers the development of effective models. In response to this challenge, our research introduces a transfer learning approach for detecting depression in speech, aiming to overcome constraints imposed by limited resources. In the context of feature representation, we obtain depression-related features by fine-tuning wav2vec 2.0. By integrating 1D-CNN and attention pooling structures, we generate advanced features at the segment level, thereby enhancing the model's capability to capture temporal relationships within audio frames. In the realm of prediction results, we integrate LSTM and self-attention mechanisms. This incorporation assigns greater weights to segments associated with depression, thereby augmenting the model's discernment of depression-related information. The experimental results indicate that our model has achieved impressive F1 scores, reaching 79% on the DAIC-WOZ dataset and 90.53% on the CMDC dataset. It outperforms recent baseline models in the field of speech-based depression detection. This provides a promising solution for effective depression detection in low-resource environments.

## Linked entities

- **Diseases:** depression (MONDO:0002050)

## Full-text entities

- **Diseases:** Depression (MESH:D003866), mental disorder (MESH:D001523)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11045867/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11045867/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC11045867/full.md

---
Source: https://tomesphere.com/paper/PMC11045867