ForensicFlow: A Tri-Modal Adaptive Network for Robust Deepfake Detection
Mohammad Romani

TL;DR
ForensicFlow is a tri-modal deepfake detection network that fuses visual, texture, and spectral cues with adaptive weighting, achieving high accuracy and robustness against sophisticated forgeries.
Contribution
The paper introduces a novel multi-domain fusion network that combines three forensic dimensions with adaptive weighting, improving deepfake detection performance.
Findings
Achieves AUC 0.9752, F1 0.9408, accuracy 0.9208 on CelebDF(v2)
Outperforms single-stream detectors in accuracy
Validates branch synergy and focus on manipulation regions
Abstract
Modern deepfakes evade detection by leaving subtle, domain-speci c artifacts that single branch networks miss. ForensicFlow addresses this by fusing evidence across three forensic dimensions: global visual inconsistencies (via ConvNeXt-tiny), ne-grained texture anomalies (via Swin Transformer-tiny), and spectral noise patterns (via CNN with channel attention). Our attention-based temporal pooling dynamically prioritizes high-evidence frames, while adaptive fusion weights each branch according to forgery type. Trained on CelebDF(v2) with Focal Loss, the model achieves AUC 0.9752, F1 0.9408, and accuracy 0.9208 out performing single-stream detectors. Ablation studies con rm branch synergy, and Grad-CAM visualizations validate focus on genuine manipulation regions (e.g., facial boundaries). This multi-domain fusion strategy establishes robustness against increasingly sophisticated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning
