DiFaR: Enhancing Multimodal Misinformation Detection with Diverse, Factual, and Relevant Rationales
Herun Wan, Jiaying Wu, Minnan Luo, Xiangzheng Kong, Zihan Ma, Zhi Zeng

TL;DR
DiFaR is a framework that improves multimodal misinformation detection by generating diverse, factual, and relevant rationales using chain-of-thought prompts and a filtering module, leading to significant performance gains.
Contribution
DiFaR introduces a novel approach combining multiple reasoning prompts and a filtering mechanism to enhance rationale quality in multimodal misinformation detection.
Findings
Outperforms baseline methods by up to 5.9%
Boosts existing detectors by up to 8.7%
Improves rationale diversity, factuality, and relevance
Abstract
Generating textual rationales from large vision-language models (LVLMs) to support trainable multimodal misinformation detectors has emerged as a promising paradigm. However, its effectiveness is fundamentally limited by three core challenges: (i) insufficient diversity in generated rationales, (ii) factual inaccuracies due to hallucinations, and (iii) irrelevant or conflicting content that introduces noise. We introduce DiFaR, a detector-agnostic framework that produces diverse, factual, and relevant rationales to enhance misinformation detection. DiFaR employs five chain-of-thought prompts to elicit varied reasoning traces from LVLMs and incorporates a lightweight post-hoc filtering module to select rationale sentences based on sentence-level factuality and relevance scores. Extensive experiments on four popular benchmarks demonstrate that DiFaR outperforms four baseline categories by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
