Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis
Amey Hengle, Atharva Kulkarni, Shantanu Patankar, Madhumitha, Chandrasekaran, Sneha D'Silva, Jemima Jacob, Rashmi Gupta

TL;DR
This paper introduces ANGST, a new benchmark dataset for multi-label classification of depression and anxiety from social media posts, and evaluates state-of-the-art language models on this complex diagnostic task.
Contribution
The paper presents ANGST, the first benchmark for comorbid mental health diagnosis from social media, and provides a comprehensive evaluation of current language models on this challenging task.
Findings
GPT-4 outperforms other models but still has limited accuracy.
No model exceeds 72% F1 score in multi-label classification.
The task reveals significant challenges in applying language models to mental health diagnostics.
Abstract
In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing
MethodsAttention Is All You Need · Dense Connections · Adam · Linear Layer · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Byte Pair Encoding · Absolute Position Encodings
