MERIT: Modular Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning

Mir Nafis Sharear Shopnil; Sharad Duwal; Abhishek Tyagi; Adiba Mahbub Proma

arXiv:2510.17590·cs.AI·April 28, 2026

MERIT: Modular Framework for Multimodal Misinformation Detection with Web-Grounded Reasoning

Mir Nafis Sharear Shopnil, Sharad Duwal, Abhishek Tyagi, Adiba Mahbub Proma

PDF

TL;DR

MERIT is a modular framework for multimodal misinformation detection that outperforms zero-shot baselines by decomposing verification into specialized modules, demonstrating strong generalization and explainability.

Contribution

Introduces MERIT, a novel modular framework that improves multimodal misinformation detection through architectural design and specialized modules, compatible with any instruction-following vision-language model.

Findings

01

MERIT achieves 81.65% F1 on MMFakeBench, outperforming GPT-4V with MMD-Agent.

02

MERIT has 6.14 points higher misinformation recall than MMD-Agent under same conditions.

03

Ablation studies show non-overlapping modules are crucial for targeted performance.

Abstract

We present MERIT, an inference-time modular framework for multimodal misinformation detection that decomposes verification into four specialized modules: visual forensics, cross-modal alignment, retrieval-augmented claim verification, and calibrated judgment. On MMFakeBench, MERIT with GPT-4o-mini achieves 81.65% F1, outperforming all reported zero-shot baselines including GPT-4V with MMD-Agent (74.0% F1). A controlled same-model evaluation confirms gains stem from architectural design: MERIT achieves 6.14 points higher misinformation recall than MMD-Agent under identical model conditions, with per-class gains of +18.0 on visual distortion and +5.33 on textual distortion. Ablation studies reveal non-overlapping module specialization, where removing any module disproportionately degrades its target category while leaving others intact. Test set evaluation on 5,000 samples confirms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.