# Leveraging AI for Analysis of Digital Health Information on Cancer Prevention Among Arab Youth and Adults: Content Analysis

**Authors:** Alia Komsany, Obada Al Zoubi, Laetitia Sebaaly, Gabrielle Harrison, Orysya Soroka, Safa ElKefi, David Scales, Erica Phillips, Laura C Pinheiro, Israa Ismail, Perla Chebli

PMC · DOI: 10.2196/77888 · JMIR Infodemiology · 2026-02-09

## TL;DR

This study examines the quality of Arabic-language cancer prevention content on TikTok and evaluates the use of AI tools like GPT-4 for analyzing such content.

## Contribution

The study introduces a novel application of AI for analyzing short-form health videos in underrepresented languages like Arabic.

## Key findings

- Emotionally framed videos on cancer prevention received high engagement but had lower quality and informational value.
- GPT-4 showed high agreement with human coders for cancer type and quality scores but struggled with tone classification.
- Only 6.6% of the top videos cited scientific literature, and most focused on diet and alternative therapies.

## Abstract

As TikTok (ByteDance) grows as a major platform for health information, the quality and accuracy of Arabic-language cancer prevention content remain unknown. Limited access to culturally relevant and evidence-based information may exacerbate disparities in cancer knowledge and prevention behaviors. Although large language models offer scalable approaches for analyzing online health content, their utility for short-form video data, especially in underrepresented languages, has not been well established.

We aimed to characterize and evaluate the quality of Arabic-language TikTok videos on cancer prevention and explore the use of large language models for scalable content analysis.

We used the TikTok research application programming interface and a GPT-assisted keyword strategy to collect Arabic-language TikTok videos (2021-2024). From an initial collection of 1800 TikTok videos, 320 were eligible after preprocessing. Of these, the top 25% (N=30) most-viewed were analyzed and manually coded for content type, cancer type, uploader identity, tone and register, scientific citation, and disclaimers. Video quality was assessed using the Patient Education Materials Assessment Tool for Audiovisual Materials for understandability and actionability, and the Global Quality Scale (GQS). GPT-4 was used to generate artificial intelligence annotations, which were compared to human coding for select variables.

The top 25% (N=30) most-viewed videos amassed a total of 21.6 million views. Diet and alternative therapies were most common (15/30, 50%), which included recommendations to reduce hydrogenated oils, increase fruit and vegetable intake, and the use of traditional remedies such as garlic and black seed. Only 6.6% (2/30) of videos cited scientific literature. General cancer (15/30, 53%), breast (5/30, 17%), and cervical (4/30, 13%) cancers were most frequently mentioned. Doctors led 30% (9/30) of videos and were more likely to produce higher quality content, including significantly higher global quality scores (GQS=4, median 4, IQR 4-4 vs 3, median 3, IQR 2-3, P=.06). Over half of the videos had low understandability (16/30, 53%) and actionability (18/30, 60%). Emotionally framed content had the highest engagement across likes and shares, although this did not reach statistical significance (P=.08 and P=.05, respectively). However, emotional tone was significantly associated with lower GQS scores (P=.01). GPT-4 showed high agreement with human coders for cancer type (Cohen κ=1.0), strong agreement for GQS (κ=0.94), but low agreement for tone classification (κ=0.15), due to misclassification of emotional delivery from text-only input.

Arabic-language TikTok cancer prevention content is highly engaging but variable in quality, with emotionally framed videos attracting substantial attention despite lower informational value. Artificial intelligence-assisted tools show strong potential for scalable, multilingual health content analysis, but multimodal approaches are needed to accurately interpret tonal and audiovisual features.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), breast cancer (MONDO:0004989), cervical cancer (MONDO:0002974)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369), breast (MESH:D061325), cervical (MESH:D002575)
- **Chemicals:** hydrogenated oils (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Allium sativum (garlic, species) [taxon 4682]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12930147/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12930147/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12930147/full.md

---
Source: https://tomesphere.com/paper/PMC12930147