Real-time stress detection on social network posts using big data technology
Hai-Yen Phan Nguyen, Phi-Lan Ly, Duc-Manh Le, Trong-Hop Do

TL;DR
This paper presents a real-time system for detecting stress in social media posts using big data technologies and a Reddit dataset, achieving around 69% accuracy with logistic regression.
Contribution
The study introduces a novel real-time stress detection system leveraging big data tools and a new Reddit dataset, with a focus on streaming data analysis.
Findings
Achieved 69.39% accuracy in stress detection.
Utilized Apache Kafka, PySpark, and AirFlow for system deployment.
Demonstrated effectiveness of logistic regression for streaming data.
Abstract
In the context of modern life, particularly in Industry 4.0 within the online space, emotions and moods are frequently conveyed through social media posts. The trend of sharing stories, thoughts, and feelings on these platforms generates a vast and promising data source for Big Data. This creates both a challenge and an opportunity for research in applying technology to develop more automated and accurate methods for detecting stress in social media users. In this study, we developed a real-time system for stress detection in online posts, using the "Dreaddit: A Reddit Dataset for Stress Analysis in Social Media," which comprises 187,444 posts across five different Reddit domains. Each domain contains texts with both stressful and non-stressful content, showcasing various expressions of stress. A labeled dataset of 3,553 lines was created for training. Apache Kafka, PySpark, and AirFlow…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Artificial Intelligence in Healthcare
MethodsLogistic Regression
