Interpreting Multimodal Communication at Scale in Short-Form Video: Visual, Audio, and Textual Mental Health Discourse on TikTok
Mingyue Zha, Ho-Chun Herbert Chang

TL;DR
This paper presents a scalable multimodal analysis framework combining automated feature extraction and interpretability methods to understand how text, visuals, and audio jointly influence engagement in TikTok mental health videos.
Contribution
It introduces a reproducible pipeline for interpretable multimodal analysis and applies it to mental health discourse on TikTok, revealing interaction patterns across modalities.
Findings
Facial expressions outperform textual sentiment in predicting viewership
Informational content attracts more attention than emotional support
Cross-modal interactions exhibit threshold-dependent effects
Abstract
Short-form video platforms integrate text, visuals, and audio into complex communicative acts, yet existing research analyzes these modalities in isolation, lacking scalable frameworks to interpret their joint contributions. This study introduces a pipeline combining automated multimodal feature extraction with Shapley value-based interpretability to analyze how text, visuals, and audio jointly influence engagement. Applying this framework to 162,965 TikTok videos and 814,825 images about social anxiety disorder (SAD), we find that facial expressions outperform textual sentiment in predicting viewership, informational content drives more attention than emotional support, and cross-modal synergies exhibit threshold-dependent effects. These findings demonstrate how multimodal analysis reveals interaction patterns invisible to single-modality approaches. Methodologically, we contribute a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Media Influence and Health · Misinformation and Its Impacts
