A Gold Standard Dataset and Evaluation Framework for Depression Detection and Explanation in Social Media using LLMs

Prajval Bolegave; Pushpak Bhattacharya

arXiv:2507.19899·cs.CL·July 29, 2025

A Gold Standard Dataset and Evaluation Framework for Depression Detection and Explanation in Social Media using LLMs

Prajval Bolegave, Pushpak Bhattacharya

PDF

TL;DR

This paper introduces a high-quality, expert-annotated dataset of social media posts for depression detection, along with an evaluation framework for assessing the faithfulness and quality of LLM-generated explanations, highlighting differences among state-of-the-art models.

Contribution

The work provides a fine-grained, clinically grounded dataset and a novel evaluation framework for LLM explanations in depression detection, advancing transparency and safety in mental health AI applications.

Findings

01

Significant performance differences among LLMs on clinical explanation tasks.

02

Zero-shot and few-shot prompting strategies impact model explanations.

03

Expert-guided prompts improve LLM explanation quality.

Abstract

Early detection of depression from online social media posts holds promise for providing timely mental health interventions. In this work, we present a high-quality, expert-annotated dataset of 1,017 social media posts labeled with depressive spans and mapped to 12 depression symptom categories. Unlike prior datasets that primarily offer coarse post-level labels \cite{cohan-etal-2018-smhd}, our dataset enables fine-grained evaluation of both model predictions and generated explanations. We develop an evaluation framework that leverages this clinically grounded dataset to assess the faithfulness and quality of natural language explanations generated by large language models (LLMs). Through carefully designed prompting strategies, including zero-shot and few-shot approaches with domain-adapted examples, we evaluate state-of-the-art proprietary LLMs including GPT-4.1, Gemini 2.5 Pro, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.