Evaluating Large Language Models for Anxiety, Depression, and Stress Detection: Insights into Prompting Strategies and Synthetic Data
Mihael Arcan, David-Paul Niland

TL;DR
This study compares various large language models and traditional methods for detecting anxiety, depression, and stress from clinical interview texts, highlighting the benefits of transformer models and synthetic data augmentation.
Contribution
It introduces a comprehensive evaluation of LLMs and synthetic data techniques for mental health detection, demonstrating improved performance over classical models.
Findings
Distil-RoBERTa achieved highest F1 (0.883) for GAD-2.
XLNet outperformed others on PHQ tasks with F1 up to 0.891.
Synthetic data methods improved recall and model generalization.
Abstract
Mental health disorders affect over one-fifth of adults globally, yet detecting such conditions from text remains challenging due to the subtle and varied nature of symptom expression. This study evaluates multiple approaches for mental health detection, comparing Large Language Models (LLMs) such as Llama and GPT with classical machine learning and transformer-based architectures including BERT, XLNet, and Distil-RoBERTa. Using the DAIC-WOZ dataset of clinical interviews, we fine-tuned models for anxiety, depression, and stress classification and applied synthetic data generation to mitigate class imbalance. Results show that Distil-RoBERTa achieved the highest F1 score (0.883) for GAD-2, while XLNet outperformed others on PHQ tasks (F1 up to 0.891). For stress detection, a zero-shot synthetic approach (SD+Zero-Shot-Basic) reached an F1 of 0.884 and ROC AUC of 0.886. Findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Sentiment Analysis and Opinion Mining
