Dreaddit: A Reddit Dataset for Stress Analysis in Social Media
Elsbeth Turcan, Kathleen McKeown

TL;DR
Dreaddit is a large Reddit dataset designed for stress detection, comprising 190K posts across multiple categories with labeled segments, enabling research on stress in diverse social media contexts.
Contribution
The paper introduces Dreaddit, a novel multi-domain Reddit dataset with stress annotations, facilitating advanced research in social media stress analysis.
Findings
Preliminary supervised models show promise in stress detection.
Data complexity varies across different Reddit categories.
The dataset enables diverse stress analysis in social media.
Abstract
Stress is a nigh-universal human experience, particularly in the online world. While stress can be a motivator, too much stress is associated with many negative health outcomes, making its identification useful across a range of domains. However, existing computational research typically only studies stress in domains such as speech, or in short genres such as Twitter. We present Dreaddit, a new text corpus of lengthy multi-domain social media data for the identification of stress. Our dataset consists of 190K posts from five different categories of Reddit communities; we additionally label 3.5K total segments taken from 3K posts using Amazon Mechanical Turk. We present preliminary supervised learning methods for identifying stress, both neural and traditional, and analyze the complexity and diversity of the data and characteristics of each category.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Digital Mental Health Interventions · Sentiment Analysis and Opinion Mining
