BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok
Minh Duc Chu, Kshitij Pawar, Zihao He, Roxanna Sharifi, Ross Sonnenblick, Magdalayna Curry, Laura D'Adamo, Lindsay Young, Stuart B Murray, Kristina Lerman

TL;DR
BigTokDetect is a clinically-informed vision-language framework that effectively detects pro-bigorexia videos on TikTok, leveraging a new expert-annotated multimodal dataset and demonstrating improved performance through multimodal fusion and fine-tuning.
Contribution
The paper introduces BigTokDetect, the first clinically-annotated multimodal benchmark dataset for pro-bigorexia content and evaluates vision-language models for automated detection with insights on model performance.
Findings
Supervised fine-tuning improves detection accuracy on fine-grained categories.
Multimodal fusion enhances model performance by 5-15%.
Video features are the most discriminative signals for detection.
Abstract
Social media platforms face escalating challenges in detecting harmful content that promotes muscle dysmorphic behaviors and cognitions (bigorexia). This content can evade moderation by camouflaging as legitimate fitness advice and disproportionately affects adolescent males. We address this challenge with BigTokDetect, a clinically informed framework for identifying pro-bigorexia content on TikTok. We introduce BigTok, the first expert-annotated multimodal benchmark dataset of over 2,200 TikTok videos labeled by clinical psychiatrists across five categories and eighteen fine-grained subcategories. Comprehensive evaluation of state-of-the-art vision-language models reveals that while commercial zero-shot models achieve the highest accuracy on broad primary categories, supervised fine-tuning enables smaller open-source models to perform better on fine-grained subcategory detection.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
