Large Visual-Language Models Are Also Good Classifiers: A Study of   In-Context Multimodal Fake News Detection

Ye Jiang; Yimin Wang

arXiv:2407.12879·cs.CL·April 17, 2025·1 cites

Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection

Ye Jiang, Yimin Wang

PDF

Open Access

TL;DR

This study evaluates large visual-language models' ability to detect fake news, demonstrating that integrating smaller model predictions into in-context learning significantly improves their accuracy in multimodal fake news detection tasks.

Contribution

The paper introduces the IMFND framework that combines in-context learning with predictions from smaller models, substantially enhancing LVLMs' fake news detection performance.

Findings

01

LVLMs perform competitively with smaller models in zero-shot FND.

02

In-context learning improves FND performance but is limited without additional strategies.

03

IMFND framework significantly boosts LVLMs' accuracy across multiple datasets.

Abstract

Large visual-language models (LVLMs) exhibit exceptional performance in visual-language reasoning across diverse cross-modal benchmarks. Despite these advances, recent research indicates that Large Language Models (LLMs), like GPT-3.5-turbo, underachieve compared to well-trained smaller models, such as BERT, in Fake News Detection (FND), prompting inquiries into LVLMs' efficacy in FND tasks. Although performance could improve through fine-tuning LVLMs, the substantial parameters and requisite pre-trained weights render it a resource-heavy endeavor for FND applications. This paper initially assesses the FND capabilities of two notable LVLMs, CogVLM and GPT4V, in comparison to a smaller yet adeptly trained CLIP model in a zero-shot context. The findings demonstrate that LVLMs can attain performance competitive with that of the smaller model. Next, we integrate standard in-context learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Residual Connection · Contrastive Language-Image Pre-training · Byte Pair Encoding · Layer Normalization