Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos
Chong Tian, Yu Wang, Chenxu Yang, Junyi Guan, Zheng Lin, Yuhan Liu, Xiuying Chen, Qirong Ho

TL;DR
This paper introduces MAGIC3, a novel fake news detection method for short videos that leverages cross-modal consistency signals across text, visuals, and audio to improve accuracy and efficiency.
Contribution
MAGIC3 is the first model to explicitly model and utilize cross-tri-modal consistency at multiple granularities for fake news detection in short videos.
Findings
MAGIC3 outperforms non-VLM baselines on FakeSV and FakeTT datasets.
The model achieves VLM-level accuracy with significantly higher throughput and lower VRAM usage.
Cross-modal consistency signals effectively distinguish real from fake videos.
Abstract
Short-form video platforms are major channels for news but also fertile ground for multimodal misinformation where each modality appears plausible alone yet cross-modal relationships are subtly inconsistent, like mismatched visuals and captions. On two benchmark datasets, FakeSV (Chinese) and FakeTT (English), we observe a clear asymmetry: real videos exhibit high text-visual but moderate text-audio consistency, while fake videos show the opposite pattern. Moreover, a single global consistency score forms an interpretable axis along which fake probability and prediction errors vary smoothly. Motivated by these observations, we present MAGIC3 (Modal-Adversarial Gated Interaction and Consistency-Centric Classifier), a detector that explicitly models and exposes cross-tri-modal consistency signals at multiple granularities. MAGIC3 combines explicit pairwise and global consistency modeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
