Speech based Depression Severity Level Classification Using a Multi-Stage Dilated CNN-LSTM Model
Nadee Seneviratne, Carol Espy-Wilson

TL;DR
This paper introduces a multi-stage dilated CNN-LSTM model for classifying depression severity from speech, utilizing articulatory coordination features to achieve more detailed and accurate assessments than binary classification methods.
Contribution
It formulates depression classification as a severity level task and proposes a novel multi-stage CNN-LSTM approach using articulatory coordination features for improved accuracy.
Findings
27.47% improvement in session-level classification accuracy using ACFs from TVs
Segment-wise classifier performance is enhanced when combined with session-wise classifier
ACFs from TVs outperform MFCCs in depression severity classification
Abstract
Speech based depression classification has gained immense popularity over the recent years. However, most of the classification studies have focused on binary classification to distinguish depressed subjects from non-depressed subjects. In this paper, we formulate the depression classification task as a severity level classification problem to provide more granularity to the classification outcomes. We use articulatory coordination features (ACFs) developed to capture the changes of neuromotor coordination that happens as a result of psychomotor slowing, a necessary feature of Major Depressive Disorder. The ACFs derived from the vocal tract variables (TVs) are used to train a dilated Convolutional Neural Network based depression classification model to obtain segment-level predictions. Then, we propose a Recurrent Neural Network based approach to obtain session-level predictions from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
