DEPA: Self-Supervised Audio Embedding for Depression Detection
Pingyue Zhang, Mengyue Wu, Heinrich Dinkel, Kai Yu

TL;DR
This paper introduces DEPA, a self-supervised audio embedding method that improves depression detection accuracy by capturing response-level representations, especially effective on sparse and large datasets.
Contribution
DEPA is a novel self-supervised pretrained audio embedding method specifically designed for depression detection, leveraging response-level features.
Findings
Significant performance improvements on depression detection datasets
Effective on both sparse and large datasets
Demonstrates the potential of self-supervised learning in audio-based mental health assessment
Abstract
Depression detection research has increased over the last few decades, one major bottleneck of which is the limited data availability and representation learning. Recently, self-supervised learning has seen success in pretraining text embeddings and has been applied broadly on related tasks with sparse data, while pretrained audio embeddings based on self-supervised learning are rarely investigated. This paper proposes DEPA, a self-supervised, pretrained depression audio embedding method for depression detection. An encoder-decoder network is used to extract DEPA on in-domain depressed datasets (DAIC and MDD) and out-domain (Switchboard, Alzheimer's) datasets. With DEPA as the audio embedding extracted at response-level, a significant performance gain is achieved on downstream tasks, evaluated on both sparse datasets like DAIC and large major depression disorder dataset (MDD). This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
