Fine-grained Multiple Supervisory Network for Multi-modal Manipulation Detecting and Grounding
Xinquan Yu, Wei Lu, Xiangyang Luo

TL;DR
This paper introduces a comprehensive multi-supervisory network for detecting and localizing media manipulation across multiple modalities, significantly improving accuracy over existing methods.
Contribution
It proposes a novel FMS network with modality reliability, unimodal internal, and cross-modal supervision modules for fine-grained manipulation detection.
Findings
Outperforms state-of-the-art methods in manipulation detection accuracy.
Effectively localizes forgery content and classifies forgery methods.
Enhances robustness against unreliable unimodal data.
Abstract
The task of Detecting and Grounding Multi-Modal Media Manipulation (DGM) is a branch of misinformation detection. Unlike traditional binary classification, it includes complex subtasks such as forgery content localization and forgery method classification. Consider that existing methods are often limited in performance due to neglecting the erroneous interference caused by unreliable unimodal data and failing to establish comprehensive forgery supervision for mining fine-grained tampering traces. In this paper, we present a Fine-grained Multiple Supervisory (FMS) network, which incorporates modality reliability supervision, unimodal internal supervision and cross-modal supervision to provide comprehensive guidance for DGM detection. For modality reliability supervision, we propose the Multimodal Decision Supervised Correction (MDSC) module. It leverages unimodal weak supervision…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
