Social Media-based Substance Use Prediction
Tao Ding, Warren K. Bickel, Shimei Pan

TL;DR
This paper presents a machine learning approach leveraging multi-view unsupervised feature learning on social media data to predict substance use with high accuracy, revealing behavioral insights.
Contribution
It introduces a novel multi-view unsupervised feature learning method for social media data to improve substance use prediction accuracy.
Findings
Achieved 86% AUC for tobacco use prediction
Achieved 81% AUC for alcohol use prediction
Achieved 84% AUC for drug use prediction
Abstract
In this paper, we demonstrate how the state-of-the-art machine learning and text mining techniques can be used to build effective social media-based substance use detection systems. Since a substance use ground truth is difficult to obtain on a large scale, to maximize system performance, we explore different feature learning methods to take advantage of a large amount of unsupervised social media data. We also demonstrate the benefit of using multi-view unsupervised feature learning to combine heterogeneous user information such as Facebook `"likes" and "status updates" to enhance system performance. Based on our evaluation, our best models achieved 86% AUC for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods. Our investigation has also uncovered interesting relations between a user's social media behavior (e.g.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Sentiment Analysis and Opinion Mining · Web Data Mining and Analysis
