ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
Andrew Kan, Christopher Kan, Zaid Nabulsi

TL;DR
ViMGuard introduces a multi-modal deep learning system that analyzes visual, audio, and textual data in short-form videos to detect misinformation, outperforming existing fact-checkers and enhancing social media trustworthiness.
Contribution
This work presents the first multi-modal architecture for SFV misinformation detection, integrating autoencoders and retrieval-augmented generation for comprehensive fact-checking.
Findings
ViMGuard outperforms three state-of-the-art fact-checkers.
It effectively analyzes visual, audio, and textual modalities.
Open-sourced code promotes further research and development.
Abstract
The rise of social media and short-form video (SFV) has facilitated a breeding ground for misinformation. With the emergence of large language models, significant research has gone into curbing this misinformation problem with automatic false claim detection for text. Unfortunately, the automatic detection of misinformation in SFV is a more complex problem that remains largely unstudied. While text samples are monomodal (only containing words), SFVs comprise three different modalities: words, visuals, and non-linguistic audio. In this work, we introduce Video Masked Autoencoders for Misinformation Guarding (ViMGuard), the first deep-learning architecture capable of fact-checking an SFV through analysis of all three of its constituent modalities. ViMGuard leverages a dual-component system. First, Video and Audio Masked Autoencoders analyze the visual and non-linguistic audio elements of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternet Traffic Analysis and Secure E-voting · Network Security and Intrusion Detection · Digital Media Forensic Detection
