ReDSM5: A Reddit Dataset for DSM-5 Depression Detection
Eliseo Bao, Anxo P\'erez, Javier Parapar

TL;DR
ReDSM5 is a new Reddit dataset with detailed sentence-level annotations by psychologists linking social media language to DSM-5 depression symptoms, enabling more interpretable depression detection models.
Contribution
It introduces ReDSM5, a novel corpus with expert-annotated DSM-5 symptom labels and explanations, bridging the gap between social media language and clinical depression diagnosis.
Findings
Baseline models for symptom classification established.
Analysis reveals lexical and emotional patterns of depression symptoms.
ReDSM5 enhances interpretability in depression detection models.
Abstract
Depression is a pervasive mental health condition that affects hundreds of millions of individuals worldwide, yet many cases remain undiagnosed due to barriers in traditional clinical access and pervasive stigma. Social media platforms, and Reddit in particular, offer rich, user-generated narratives that can reveal early signs of depressive symptomatology. However, existing computational approaches often label entire posts simply as depressed or not depressed, without linking language to specific criteria from the DSM-5, the standard clinical framework for diagnosing depression. This limits both clinical relevance and interpretability. To address this gap, we introduce ReDSM5, a novel Reddit corpus comprising 1484 long-form posts, each exhaustively annotated at the sentence level by a licensed psychologist for the nine DSM-5 depression symptoms. For each label, the annotator also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Sentiment Analysis and Opinion Mining
