Still Not Quite There! Evaluating Large Language Models for Comorbid   Mental Health Diagnosis

Amey Hengle; Atharva Kulkarni; Shantanu Patankar; Madhumitha; Chandrasekaran; Sneha D'Silva; Jemima Jacob; Rashmi Gupta

arXiv:2410.03908·cs.CL·October 8, 2024

Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis

Amey Hengle, Atharva Kulkarni, Shantanu Patankar, Madhumitha, Chandrasekaran, Sneha D'Silva, Jemima Jacob, Rashmi Gupta

PDF

Open Access

TL;DR

This paper introduces ANGST, a new benchmark dataset for multi-label classification of depression and anxiety from social media posts, and evaluates state-of-the-art language models on this complex diagnostic task.

Contribution

The paper presents ANGST, the first benchmark for comorbid mental health diagnosis from social media, and provides a comprehensive evaluation of current language models on this challenging task.

Findings

01

GPT-4 outperforms other models but still has limited accuracy.

02

No model exceeds 72% F1 score in multi-label classification.

03

The task reveals significant challenges in applying language models to mental health diagnostics.

Abstract

In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing

MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings