A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

Bin Ma; Yifei Zhang; Yongjin Xian; Qi Li; Linna Zhou; Gongxun Miao

arXiv:2508.11141·cs.CV·August 18, 2025

A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

Bin Ma, Yifei Zhang, Yongjin Xian, Qi Li, Linna Zhou, Gongxun Miao

PDF

TL;DR

This paper introduces a novel cross-modal rumor detection method leveraging contrastive learning to explore internal correlations between text and multi-scale image features, significantly improving detection accuracy.

Contribution

It proposes a multi-scale image and context correlation exploration algorithm using contrastive learning, mutual information maximization, and adaptive fusion for enhanced rumor detection.

Findings

01

Achieves superior performance over state-of-the-art methods

02

Effectively captures cross-modal relevance between text and images

03

Demonstrates robustness on real-world datasets

Abstract

Existing rumor detection methods often neglect the content within images as well as the inherent relationships between contexts and images across different visual scales, thereby resulting in the loss of critical information pertinent to rumor identification. To address these issues, this paper presents a novel cross-modal rumor detection scheme based on contrastive learning, namely the Multi-scale Image and Context Correlation exploration algorithm (MICC). Specifically, we design an SCLIP encoder to generate unified semantic embeddings for text and multi-scale image patches through contrastive pretraining, enabling their relevance to be measured via dot-product similarity. Building upon this, a Cross-Modal Multi-Scale Alignment module is introduced to identify image regions most relevant to the textual semantics, guided by mutual information maximization and the information bottleneck…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.