SMHD: A Large-Scale Resource for Exploring Online Language Usage for Multiple Mental Health Conditions
Arman Cohan, Bart Desmet, Andrew Yates, Luca Soldaini, Sean MacAvaney, and Nazli Goharian

TL;DR
This paper introduces SMHD, a large-scale dataset of social media posts with self-reported mental health diagnoses, enabling research on language patterns associated with various mental health conditions.
Contribution
The creation of the SMHD dataset with high-precision patterns for identifying self-reported diagnoses across nine mental health conditions is a novel contribution.
Findings
Distinct linguistic patterns associated with different mental health conditions
Effective text classification methods for identifying mental health conditions from language
SMHD dataset facilitates large-scale mental health language research
Abstract
Mental health is a significant and growing public health concern. As language usage can be leveraged to obtain crucial insights into mental health conditions, there is a need for large-scale, labeled, mental health-related datasets of users who have been diagnosed with one or more of such conditions. In this paper, we investigate the creation of high-precision patterns to identify self-reported diagnoses of nine different mental health conditions, and obtain high-quality labeled data without the need for manual labelling. We introduce the SMHD (Self-reported Mental Health Diagnoses) dataset and make it available. SMHD is a novel large dataset of social media posts from users with one or multiple mental health conditions along with matched control users. We examine distinctions in users' language, as measured by linguistic and psychological variables. We further explore text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Topic Modeling
