Multimodal Misinformation Detection Using Early Fusion of Linguistic, Visual, and Social Features
Gautam Kishore Shahi

TL;DR
This paper explores the effectiveness of early fusion of linguistic, visual, and social features in multimodal misinformation detection on Twitter, demonstrating improved classification accuracy during critical periods like elections and pandemics.
Contribution
It introduces a multimodal classification approach combining text, images, and social features with early fusion, enhancing misinformation detection performance over unimodal and bimodal models.
Findings
15% improvement over unimodal models
5% improvement over bimodal models
Analyzed misinformation propagation patterns
Abstract
Amid a tidal wave of misinformation flooding social media during elections and crises, extensive research has been conducted on misinformation detection, primarily focusing on text-based or image-based approaches. However, only a few studies have explored multimodal feature combinations, such as integrating text and images for building a classification model to detect misinformation. This study investigates the effectiveness of different multimodal feature combinations, incorporating text, images, and social features using an early fusion approach for the classification model. This study analyzed 1,529 tweets containing both text and images during the COVID-19 pandemic and election periods collected from Twitter (now X). A data enrichment process was applied to extract additional social features, as well as visual features, through techniques such as object detection and optical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
