M3D-Net: Multi-Modal 3D Facial Feature Reconstruction Network for Deepfake Detection

Haotian Wu; Yue Cheng; Shan Bian

arXiv:2604.14574·cs.CV·April 17, 2026

M3D-Net: Multi-Modal 3D Facial Feature Reconstruction Network for Deepfake Detection

Haotian Wu, Yue Cheng, Shan Bian

PDF

TL;DR

This paper introduces M3D-Net, a multi-modal 3D facial reconstruction network that improves deepfake detection by leveraging multi-scale features and attention-based fusion of RGB and 3D data.

Contribution

It presents a novel end-to-end dual-stream architecture with modules for 3D facial reconstruction, feature pre-fusion, and multi-modal fusion, advancing deepfake detection capabilities.

Findings

01

Achieves state-of-the-art detection accuracy on multiple datasets.

02

Demonstrates robustness and generalization across diverse scenarios.

03

Outperforms existing deepfake detection methods significantly.

Abstract

With the rapid advancement of deep learning in image generation, facial forgery techniques have achieved unprecedented realism, posing serious threats to cybersecurity and information authenticity. Most existing deepfake detection approaches rely on the reconstruction of isolated facial attributes without fully exploiting the complementary nature of multi-modal feature representations. To address these challenges, this paper proposes a novel Multi-Modal 3D Facial Feature Reconstruction Network (M3D-Net) for deepfake detection. Our method leverages an end-to-end dual-stream architecture that reconstructs fine-grained facial geometry and reflectance properties from single-view RGB images via a self-supervised 3D facial reconstruction module. The network further enhances detection performance through a 3D Feature Pre-fusion Module (PFM), which adaptively adjusts multi-scale features, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.