Lips Are Lying: Spotting the Temporal Inconsistency between Audio and Visual in Lip-Syncing DeepFakes
Weifeng Liu, Tianyi She, Jiawei Liu, Boheng Li, Dongyu Yao, Ziyou, Liang, Run Wang

TL;DR
This paper introduces a novel method for detecting lip-sync DeepFake videos by exploiting inconsistencies between lip movements and audio, achieving over 95% accuracy and creating a new dataset for further research.
Contribution
The paper presents the first dedicated approach to lip-forgery detection that leverages biological links between lips and head regions, along with a new high-quality dataset for the field.
Findings
Achieves over 95.3% accuracy in spotting lip-sync DeepFakes.
Demonstrates robustness against diverse input transformations.
Performs well in real-world scenarios like WeChat video calls.
Abstract
In recent years, DeepFake technology has achieved unprecedented success in high-quality video synthesis, but these methods also pose potential and severe security threats to humanity. DeepFake can be bifurcated into entertainment applications like face swapping and illicit uses such as lip-syncing fraud. However, lip-forgery videos, which neither change identity nor have discernible visual artifacts, present a formidable challenge to existing DeepFake detection methods. Our preliminary experiments have shown that the effectiveness of the existing methods often drastically decrease or even fail when tackling lip-syncing videos. In this paper, for the first time, we propose a novel approach dedicated to lip-forgery identification that exploits the inconsistency between lip movements and audio signals. We also mimic human natural cognition by capturing subtle biological links between lips…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
