FTFDNet: Learning to Detect Talking Face Video Manipulation with   Tri-Modality Interaction

Ganglai Wang; Peng Zhang; Junwen Xiong; Feihan Yang; Wei Huang; and; Yufei Zha

arXiv:2307.03990·cs.CV·July 11, 2023

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

Ganglai Wang, Peng Zhang, Junwen Xiong, Feihan Yang, Wei Huang, and, Yufei Zha

PDF

Open Access

TL;DR

This paper introduces FTFDNet, a multi-modal deep learning model that combines visual, audio, and motion features with a novel attention mechanism to improve detection of fake talking face videos, especially those manipulated through lip synchronization.

Contribution

The paper proposes FTFDNet with a cross-modal fusion module and a new audio-visual attention mechanism, enhancing fake video detection by leveraging multi-modal cues and motion analysis.

Findings

01

Outperforms state-of-the-art methods on FTFDD, DFDC, and DF-TIMIT datasets.

02

Effectively captures disordered motion cues in fake videos.

03

Improves detection accuracy by integrating audio, visual, and motion features.

Abstract

DeepFake based digital facial forgery is threatening public media security, especially when lip manipulation has been used in talking face generation, and the difficulty of fake video detection is further improved. By only changing lip shape to match the given speech, the facial features of identity are hard to be discriminated in such fake talking face videos. Together with the lack of attention on audio stream as the prior knowledge, the detection failure of fake talking face videos also becomes inevitable. It's found that the optical flow of the fake talking face video is disordered especially in the lip region while the optical flow of the real video changes regularly, which means the motion feature from optical flow is useful to capture manipulation cues. In this study, a fake talking face detection network (FTFDNet) is proposed by incorporating visual, audio and motion features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis