Exploring Social Media Posts for Depression Identification: A Study on Reddit Dataset
Nandigramam Sai Harshit, Nilesh Kumar Sahu, Haroon R. Lone

TL;DR
This study explores using Reddit social media posts and machine learning to identify depression, achieving over 92% accuracy in classifying depressive versus non-depressive content.
Contribution
It demonstrates the feasibility of leveraging social media data and classical machine learning models for depression detection with high accuracy.
Findings
Achieved 92.28% accuracy in classifying depression-related posts.
Used UMLS Metathesaurus for labeling data.
Analyzed top Reddit posts from 2022 for depression indicators.
Abstract
Depression is one of the most common mental disorders affecting an individual's personal and professional life. In this work, we investigated the possibility of utilizing social media posts to identify depression in individuals. To achieve this goal, we conducted a preliminary study where we extracted and analyzed the top Reddit posts made in 2022 from depression-related forums. The collected data were labeled as depressive and non-depressive using UMLS Metathesaurus. Further, the pre-processed data were fed to classical machine learning models, where we achieved an accuracy of 92.28\% in predicting the depressive and non-depressive posts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Digital Mental Health Interventions
