Multimodal Depression Classification Using Articulatory Coordination   Features And Hierarchical Attention Based Text Embeddings

Nadee Seneviratne; Carol Espy-Wilson

arXiv:2202.06238·eess.AS·February 15, 2022·1 cites

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

Nadee Seneviratne, Carol Espy-Wilson

PDF

Open Access

TL;DR

This paper presents a multimodal depression detection system combining articulatory features from speech and hierarchical attention-based text embeddings, demonstrating improved accuracy over unimodal methods, especially with limited data.

Contribution

It introduces a novel multimodal depression classifier integrating articulatory coordination features and hierarchical attention text embeddings, with a multi-stage training approach for limited data scenarios.

Findings

01

7.5% and 13.7% AUC improvements over unimodal classifiers

02

Effective session-wise prediction with limited training data

03

Enhanced depression detection accuracy through multimodal integration

Abstract

Multimodal depression classification has gained immense popularity over the recent years. We develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to uni-modal classifiers (7.5% and 13.7% for audio and text respectively). We show that in the case of limited training data, a segment-level classifier can first be trained to then obtain a session-wise prediction without hindering the performance, using a multi-stage convolutional recurrent neural network. A text model is trained using a Hierarchical Attention Network (HAN). The multimodal system is developed by combining embeddings from the session-level audio model and the HAN text model

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Music and Audio Processing