Integrating Large Language Models into a Tri-Modal Architecture for   Automated Depression Classification on the DAIC-WOZ

Santosh V. Patapati

arXiv:2407.19340·cs.CV·October 15, 2024

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification on the DAIC-WOZ

Santosh V. Patapati

PDF

Open Access

TL;DR

This paper introduces a novel tri-modal architecture combining speech, facial expressions, and large language models for depression detection, achieving state-of-the-art accuracy on the DAIC-WOZ dataset.

Contribution

It is the first to integrate large language models into a multi-modal depression classification framework, enhancing performance over existing models.

Findings

01

Achieved 91.01% accuracy in Leave-One-Subject-Out testing.

02

Surpassed all baseline and state-of-the-art models on DAIC-WOZ.

03

Demonstrated the effectiveness of large language models in multi-modal depression detection.

Abstract

Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Machine Learning in Healthcare

MethodsAttention Is All You Need · Linear Layer · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Absolute Position Encodings · Dense Connections