# Building an analytical framework for tobacco-related information on social media: an exploratory analysis with generative AI assistance

**Authors:** Eileen Han, Miao Feng, Pamela Ling

PMC · DOI: 10.1186/s12889-025-24767-w · BMC Public Health · 2025-10-28

## TL;DR

This paper creates a framework to analyze tobacco-related social media content, revealing misleading information and policy discussions impacting public health.

## Contribution

A novel multi-dimensional analytical framework for categorizing tobacco-related social media content, including policy discussions and misinformation.

## Key findings

- Policy-related discussions were prevalent on Twitter, often overlooked in previous studies.
- Pro-vaping content frequently misinterprets scientific findings and policies.
- Misleading content includes fabrication, misrepresentation, and distortion of tobacco-related information.

## Abstract

The propagation of tobacco-related information that is inconsistent with public health guide significantly impacts public health, particularly affecting people with less access to reliable information sources (such as those with lower education), who may also suffer disproportionate tobacco-related morbidity and mortality. This study develops a multi-dimensional analytical framework for identifying and categorizing tobacco-related information on social media. Using a dataset of tweets, the framework was constructed through qualitative analysis, which was then compared with an exploratory, AI-assisted analysis to assess the capabilities of current automated tools.

A collection of 3.4 million tweets related to tobacco and nicotine was refined to 842,754 after removing irrelevant and duplicate posts. LDA topic modeling identified six unique topics, from which two randomly selected samples of tweets were drawn to perform qualitative analysis and AI-assisted analysis to identify categories of tobacco information.

The identified tobacco-related information was categorized by three dimensions (1) content, including safety and health effects, cessation, substance, and policy; (2) type of falsehood, which included fabrication and unsubstantiated claims, misrepresentations, and distortions; and (3) source, ranging from individuals and retail stores to advocacy groups and influencers. A notable finding was the prevalence of policy-related discussions of tobacco information on Twitter (X), highlighting this often-overlooked domain. The controversy over vaping has amplified pro-vaping voices on social media, with content frequently misinterpreting scientific findings, policies, and expert opinions, reflecting more nuanced and difficult to recognize falsehood in the misleading content.

This study offers a comprehensive framework for analyzing tobacco-related information on social media, emphasizing key issues in policy debates and the presence of conspiracy narratives. This framework can inform the design of interventions for less informed populations and enhance data annotation for machine learning tasks.

## Full-text entities

- **Chemicals:** nicotine (MESH:D009538)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12570755/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12570755/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12570755/full.md

---
Source: https://tomesphere.com/paper/PMC12570755