Mining Forgery Traces from Reconstruction Error: A Weakly Supervised Framework for Multimodal Deepfake Temporal Localization

Midou Guo; Qilin Yin; Wei Lu; Rui Yang

arXiv:2601.21458·cs.CV·May 19, 2026

Mining Forgery Traces from Reconstruction Error: A Weakly Supervised Framework for Multimodal Deepfake Temporal Localization

Midou Guo, Qilin Yin, Wei Lu, Rui Yang

PDF

TL;DR

This paper introduces RT-DeepLoc, a weakly supervised framework that uses reconstruction errors from a Masked Autoencoder to accurately localize deepfake manipulations over time without requiring detailed annotations.

Contribution

The authors propose a novel reconstruction-based approach with a new contrastive loss for effective weakly supervised deepfake localization, outperforming existing methods.

Findings

01

Achieves state-of-the-art results on large-scale datasets.

02

Effectively localizes forgeries without dense annotations.

03

Robustly generalizes to unseen forgery methods.

Abstract

Modern deepfakes have evolved into localized and intermittent manipulations that require fine-grained temporal localization to mitigate severe digital security risks. The prohibitive cost of frame-level annotation makes weakly supervised methods a practical necessity, which rely only on video-level labels. To this end, we propose Reconstruction-based Temporal Deepfake Localization (RT-DeepLoc), a weakly supervised temporal forgery localization framework that identifies forgeries via reconstruction errors. Our framework uses a Masked Autoencoder (MAE) trained exclusively on authentic data to learn its intrinsic spatiotemporal patterns; this allows the model to produce significant reconstruction discrepancies for forged segments, effectively providing the missing fine-grained cues for accurate localization without demanding dense human annotations. To robustly leverage these indicators,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning