A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization

Wenbo Xu; Junyan Wu; Wei Lu; Xiangyang Luo; Qian Wang

arXiv:2507.16596·cs.CV·August 5, 2025

A Multimodal Deviation Perceiving Framework for Weakly-Supervised Temporal Forgery Localization

Wenbo Xu, Junyan Wu, Wei Lu, Xiangyang Luo, Qian Wang

PDF

TL;DR

This paper introduces a multimodal framework for weakly-supervised temporal forgery localization in videos, leveraging cross-modal attention and deviation loss to accurately identify forged segments with only video-level labels.

Contribution

It proposes a novel multimodal interaction mechanism and deviation perceiving loss, enabling refined localization of forged segments without requiring detailed annotations.

Findings

01

Achieves comparable results to fully-supervised methods.

02

Effectively identifies forged segments using only video-level labels.

03

Demonstrates robustness across multiple evaluation metrics.

Abstract

Current researches on Deepfake forensics often treat detection as a classification task or temporal forgery localization problem, which are usually restrictive, time-consuming, and challenging to scale for large datasets. To resolve these issues, we present a multimodal deviation perceiving framework for weakly-supervised temporal forgery localization (MDP), which aims to identify temporal partial forged segments using only video-level annotations. The MDP proposes a novel multimodal interaction mechanism (MI) and an extensible deviation perceiving loss to perceive multimodal deviation, which achieves the refined start and end timestamps localization of forged segments. Specifically, MI introduces a temporal property preserving cross-modal attention to measure the relevance between the visual and audio modalities in the probabilistic embedding space. It could identify the inter-modality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.