Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts
Junwei Sun, Siqi Ma, Yiran Fan, Peter Washington

TL;DR
This study compares traditional machine learning and large language models in classifying anxiety and depression from psychotherapy transcripts, finding that advanced models do not outperform simpler methods.
Contribution
It provides a comprehensive evaluation of various models, including recent LLMs, for mental health classification tasks using conversational data.
Findings
State-of-the-art models do not improve classification accuracy over traditional methods.
Traditional machine learning with feature engineering remains competitive.
Large language models show limited benefit in this specific classification task.
Abstract
We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tune both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We observe that state-of-the-art models fail to enhance classification outcomes compared to traditional machine learning methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Mental Health Research Topics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Weight Decay · Softmax · WordPiece
