A Benchmark Suite of Reddit-Derived Datasets for Mental Health Detection
Khalid Hasan, Jamil Saquer

TL;DR
This paper introduces a comprehensive, validated benchmark suite of four Reddit-based datasets designed for mental health detection tasks, facilitating reproducibility and cross-task comparison in NLP research.
Contribution
It provides a well-annotated, multi-task benchmark dataset collection for mental health detection from Reddit, enabling standardized evaluation and research advancement.
Findings
Transformer and contextualized recurrent models achieve high performance (F1 ~ 93-99%) on these tasks.
All datasets have high inter-annotator agreement (>0.8), ensuring label reliability.
The benchmark supports reproducible, cross-task mental health NLP studies.
Abstract
The growing availability of online support groups has opened up new windows to study mental health through natural language processing (NLP). However, it is hindered by a lack of high-quality, well-validated datasets. Existing studies have a tendency to build task-specific corpora without collecting them into widely available resources, and this makes reproducibility as well as cross-task comparison challenging. In this paper, we present a uniform benchmark set of four Reddit-based datasets for disjoint but complementary tasks: (i) detection of suicidal ideation, (ii) binary general mental disorder detection, (iii) bipolar disorder detection, and (iv) multi-class mental disorder classification. All datasets were established upon diligent linguistic inspection, well-defined annotation guidelines, and human-judgmental verification. Inter-annotator agreement metrics always exceeded the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
