Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li; Yang Yang; Zichang Tan; Huan Liu; Weihua Chen; Xu Zhou; Zhen Lei

arXiv:2506.05890·cs.CV·June 9, 2025

Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation

Yiheng Li, Yang Yang, Zichang Tan, Huan Liu, Weihua Chen, Xu Zhou, Zhen Lei

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel consistency learning framework for detecting and grounding multi-modal media manipulation, significantly improving fine-grained forgery perception and achieving state-of-the-art results.

Contribution

The paper proposes Contextual-Semantic Consistency Learning (CSCL) with dual-branch decoders for enhanced local content consistency detection in multi-modal media manipulation.

Findings

01

CSCL achieves new state-of-the-art performance on DGM4 datasets.

02

The method improves grounding accuracy of manipulated content.

03

Extensive experiments validate the effectiveness of fine-grained consistency modeling.

Abstract

To tackle the threat of fake news, the task of detecting and grounding multi-modal media manipulation DGM4 has received increasing attention. However, most state-of-the-art methods fail to explore the fine-grained consistency within local content, usually resulting in an inadequate perception of detailed forgery and unreliable results. In this paper, we propose a novel approach named Contextual-Semantic Consistency Learning (CSCL) to enhance the fine-grained perception ability of forgery for DGM4. Two branches for image and text modalities are established, each of which contains two cascaded decoders, i.e., Contextual Consistency Decoder (CCD) and Semantic Consistency Decoder (SCD), to capture within-modality contextual consistency and across-modality semantic consistency, respectively. Both CCD and SCD adhere to the same criteria for capturing fine-grained forgery details. To be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liyih/cscl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection