TL;DR
This paper presents a novel multi-modal deep learning approach combining Transformer and CNN models for depression detection from clinical interviews, achieving state-of-the-art results on the Distress Analysis Interview Corpus.
Contribution
It introduces a new multi-modal framework with data augmentation and topic modeling, improving depression detection accuracy over existing methods.
Findings
Transformer-based text model achieved state-of-the-art performance.
Deep 1D CNN excelled in acoustic feature modeling.
Multi-modal combination outperformed single modality models.
Abstract
In this study, we focus on automated approaches to detect depression from clinical interviews using multi-modal machine learning (ML). Our approach differentiates from other successful ML methods such as context-aware analysis through feature engineering and end-to-end deep neural networks for depression detection utilizing the Distress Analysis Interview Corpus. We propose a novel method that incorporates: (1) pre-trained Transformer combined with data augmentation based on topic modelling for textual data; and (2) deep 1D convolutional neural network (CNN) for acoustic feature modeling. The simulation results demonstrate the effectiveness of the proposed method for training multi-modal deep learning models. Our deep 1D CNN and Transformer models achieved state-of-the-art performance for audio and text modalities respectively. Combining them in a multi-modal framework also outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsByte Pair Encoding · Linear Layer · Absolute Position Encodings · Dropout · Softmax · Attention Is All You Need · Dense Connections · Residual Connection · Multi-Head Attention · Adam
