Public Health Advocacy Dataset: A Dataset of Tobacco Usage Videos from Social Media
Naga VS Raviteja Chappa, Charlotte McCormick, Susana Rodriguez, Gongora, Page Daniel Dobbs, Khoa Luu

TL;DR
The paper introduces PHAD, a large multi-modal dataset of tobacco-related social media videos, and demonstrates a two-stage classification approach that improves content categorization for public health analysis.
Contribution
It provides the first comprehensive multi-modal dataset of tobacco videos from social media and introduces a novel Vision-Language Encoder-based classification method.
Findings
Superior classification accuracy with the VL Encoder approach
Identification of engagement trends in vaping and e-cigarette content
Dataset enables targeted public health interventions
Abstract
The Public Health Advocacy Dataset (PHAD) is a comprehensive collection of 5,730 videos related to tobacco products sourced from social media platforms like TikTok and YouTube. This dataset encompasses 4.3 million frames and includes detailed metadata such as user engagement metrics, video descriptions, and search keywords. This is the first dataset with these features providing a valuable resource for analyzing tobacco-related content and its impact. Our research employs a two-stage classification approach, incorporating a Vision-Language (VL) Encoder, demonstrating superior performance in accurately categorizing various types of tobacco products and usage scenarios. The analysis reveals significant user engagement trends, particularly with vaping and e-cigarette content, highlighting areas for targeted public health interventions. The PHAD addresses the need for multi-modal data in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Media in Health Education · Computational and Text Analysis Methods
