SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu, Mong Li Lee

TL;DR
SNIFFER is a multimodal large language model designed to detect and explain out-of-context misinformation by assessing image-text consistency and leveraging external knowledge, significantly improving accuracy and interpretability.
Contribution
The paper introduces SNIFFER, a novel two-stage instruction-tuned multimodal model that enhances out-of-context misinformation detection and explanation capabilities.
Findings
SNIFFER surpasses original MLLM by over 40% in detection accuracy.
It outperforms state-of-the-art methods in misinformation detection.
Provides accurate and persuasive explanations validated by evaluations.
Abstract
Misinformation is a prevalent societal issue due to its potential high risks. Out-of-context (OOC) misinformation, where authentic images are repurposed with false text, is one of the easiest and most effective ways to mislead audiences. Current methods focus on assessing image-text consistency but lack convincing explanations for their judgments, which is essential for debunking misinformation. While Multimodal Large Language Models (MLLMs) have rich knowledge and innate capability for visual reasoning and explanation generation, they still lack sophistication in understanding and discovering the subtle crossmodal differences. In this paper, we introduce SNIFFER, a novel multimodal large language model specifically engineered for OOC misinformation detection and explanation. SNIFFER employs two-stage instruction tuning on InstructBLIP. The first stage refines the model's concept…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam
