M4-BLIP: Advancing Multi-Modal Media Manipulation Detection through Face-Enhanced Local Analysis
Hang Wu, Ke Sun, Jiayi Ji, Xiaoshuai Sun, Rongrong Ji

TL;DR
M4-BLIP is a novel multi-modal media manipulation detection framework that leverages face-enhanced local analysis and integrates with large language models to improve accuracy and interpretability.
Contribution
The paper introduces M4-BLIP, a framework combining local facial features with global analysis using BLIP-2, and integrates LLMs for better interpretability of media manipulation detection.
Findings
Outperforms state-of-the-art detection methods.
Effectively integrates local facial and global features.
Enhances interpretability with LLM integration.
Abstract
In the contemporary digital landscape, multi-modal media manipulation has emerged as a significant societal threat, impacting the reliability and integrity of information dissemination. Current detection methodologies in this domain often overlook the crucial aspect of localized information, despite the fact that manipulations frequently occur in specific areas, particularly in facial regions. In response to this critical observation, we propose the M4-BLIP framework. This innovative framework utilizes the BLIP-2 model, renowned for its ability to extract local features, as the cornerstone for feature extraction. Complementing this, we incorporate local facial information as prior knowledge. A specially designed alignment and fusion module within M4-BLIP meticulously integrates these local and global features, creating a harmonious blend that enhances detection accuracy. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis
