Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Rui Shao; Tianxing Wu; Jianlong Wu; Liqiang Nie; Ziwei Liu

arXiv:2309.14203·cs.CV·September 26, 2023

Detecting and Grounding Multi-Modal Media Manipulation and Beyond

Rui Shao, Tianxing Wu, Jianlong Wu, Liqiang Nie, Ziwei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new problem of detecting and grounding multi-modal media manipulation, proposing a novel dataset and a hierarchical transformer model to analyze subtle cross-modal forgery traces.

Contribution

It presents the first large-scale dataset for multi-modal fake media detection and grounding, along with a novel hierarchical transformer model, HAMMER, for fine-grained manipulation reasoning across modalities.

Findings

01

HAMMER outperforms existing methods in manipulation detection and grounding.

02

HAMMER++ achieves further improvements with contrastive learning.

03

The dataset enables comprehensive evaluation of multi-modal manipulation detection.

Abstract

Misinformation has become a pressing issue. Fake media, in both visual and textual forms, is widespread on the web. While various deepfake detection and text fake news detection methods have been proposed, they are only designed for single-modality forgery based on binary classification, let alone analyzing and reasoning subtle forgery traces across different modalities. In this paper, we highlight a new research problem for multi-modal fake media, namely Detecting and Grounding Multi-Modal Media Manipulation (DGM^4). DGM^4 aims to not only detect the authenticity of multi-modal media, but also ground the manipulated content, which requires deeper reasoning of multi-modal media manipulation. To support a large-scale investigation, we construct the first DGM^4 dataset, where image-text pairs are manipulated by various approaches, with rich annotation of diverse manipulations. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rshaojimmy/multimodal-deepfake
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Advanced Malware Detection Techniques · Spam and Phishing Detection

MethodsContrastive Learning