UNITE-FND: Reframing Multimodal Fake News Detection through Unimodal Scene Translation
Arka Mukherjee, Shreya Ghosh

TL;DR
UNITE-FND transforms multimodal fake news detection into a unimodal text classification task using scene translation, achieving high accuracy with significantly reduced computational costs.
Contribution
It introduces a novel unimodal approach with specialized prompting strategies and a new dataset, outperforming prior multimodal models in efficiency and accuracy.
Findings
Achieves 92.52% accuracy in binary classification.
Reduces computational costs by over 10x.
Introduces new metrics for image-to-text conversion quality.
Abstract
Multimodal fake news detection typically demands complex architectures and substantial computational resources, posing deployment challenges in real-world settings. We introduce UNITE-FND, a novel framework that reframes multimodal fake news detection as a unimodal text classification task. We propose six specialized prompting strategies with Gemini 1.5 Pro, converting visual content into structured textual descriptions, and enabling efficient text-only models to preserve critical visual information. To benchmark our approach, we introduce Uni-Fakeddit-55k, a curated dataset family of 55,000 samples each, each processed through our multimodal-to-unimodal translation framework. Experimental results demonstrate that UNITE-FND achieves 92.52% accuracy in binary classification, surpassing prior multimodal models while reducing computational costs by over 10x (TinyBERT variant: 14.5M…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Topic Modeling
