GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

Xiaodong Zhu; Yuanming Zheng; Suting Wang; Junqi Yang; Yuhong Yang; Weiping Tu; Zhongyuan Wang

arXiv:2603.05095·cs.CV·March 6, 2026

GEM-TFL: Bridging Weak and Full Supervision for Forgery Localization through EM-Guided Decomposition and Temporal Refinement

Xiaodong Zhu, Yuanming Zheng, Suting Wang, Junqi Yang, Yuhong Yang, Weiping Tu, Zhongyuan Wang

PDF

Open Access

TL;DR

GEM-TFL introduces a novel framework that improves weakly supervised temporal forgery localization by leveraging EM-based attribute reformulation, temporal refinement, and graph-based proposal modeling, achieving results close to fully supervised methods.

Contribution

The paper presents GEM-TFL, a two-phase framework that bridges the supervision gap in temporal forgery localization using EM optimization, temporal consistency refinement, and graph-based proposal modeling.

Findings

01

Achieves more accurate forgery localization than previous weakly supervised methods.

02

Narrowed the performance gap between weakly and fully supervised approaches.

03

Demonstrates robustness and effectiveness on benchmark datasets.

Abstract

Temporal Forgery Localization (TFL) aims to precisely identify manipulated segments within videos or audio streams, providing interpretable evidence for multimedia forensics and security. While most existing TFL methods rely on dense frame-level labels in a fully supervised manner, Weakly Supervised TFL (WS-TFL) reduces labeling cost by learning only from binary video-level labels. However, current WS-TFL approaches suffer from mismatched training and inference objectives, limited supervision from binary labels, gradient blockage caused by non-differentiable top-k aggregation, and the absence of explicit modeling of inter-proposal relationships. To address these issues, we propose GEM-TFL (Graph-based EM-powered Temporal Forgery Localization), a two-phase classification-regression framework that effectively bridges the supervision gap between training and inference. Built upon this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning