M3D-Net: Multi-Modal 3D Facial Feature Reconstruction Network for Deepfake Detection
Haotian Wu, Yue Cheng, Shan Bian

TL;DR
This paper introduces M3D-Net, a multi-modal 3D facial reconstruction network that improves deepfake detection by leveraging multi-scale features and attention-based fusion of RGB and 3D data.
Contribution
It presents a novel end-to-end dual-stream architecture with modules for 3D facial reconstruction, feature pre-fusion, and multi-modal fusion, advancing deepfake detection capabilities.
Findings
Achieves state-of-the-art detection accuracy on multiple datasets.
Demonstrates robustness and generalization across diverse scenarios.
Outperforms existing deepfake detection methods significantly.
Abstract
With the rapid advancement of deep learning in image generation, facial forgery techniques have achieved unprecedented realism, posing serious threats to cybersecurity and information authenticity. Most existing deepfake detection approaches rely on the reconstruction of isolated facial attributes without fully exploiting the complementary nature of multi-modal feature representations. To address these challenges, this paper proposes a novel Multi-Modal 3D Facial Feature Reconstruction Network (M3D-Net) for deepfake detection. Our method leverages an end-to-end dual-stream architecture that reconstructs fine-grained facial geometry and reflectance properties from single-view RGB images via a self-supervised 3D facial reconstruction module. The network further enhances detection performance through a 3D Feature Pre-fusion Module (PFM), which adaptively adjusts multi-scale features, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
