Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition

Maheswar Bora; Tashvik Dhamija; Shukesh Reddy; Baptiste Chopin; Pranav Balaji; Abhijit Das; Antitza Dantcheva

arXiv:2511.22443·cs.CV·December 1, 2025

Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition

Maheswar Bora, Tashvik Dhamija, Shukesh Reddy, Baptiste Chopin, Pranav Balaji, Abhijit Das, Antitza Dantcheva

PDF

Open Access

TL;DR

This paper introduces FauxNet, a deepfake detection network leveraging visual speech recognition features, achieving superior zero-shot detection and attribution across diverse datasets, including newly created Authentica datasets.

Contribution

The paper presents FauxNet, a novel deepfake detection model based on pre-trained VSR features, and introduces large-scale Authentica datasets for robust evaluation.

Findings

01

FauxNet outperforms state-of-the-art in zero-shot detection.

02

FauxNet can attribute videos to specific generation techniques.

03

Authentica datasets provide extensive real and fake video data for benchmarking.

Abstract

Deepfake generation has witnessed remarkable progress, contributing to highly realistic generated images, videos, and audio. While technically intriguing, such progress has raised serious concerns related to the misuse of manipulated media. To mitigate such misuse, robust and reliable deepfake detection is urgently needed. Towards this, we propose a novel network FauxNet, which is based on pre-trained Visual Speech Recognition (VSR) features. By extracting temporal VSR features from videos, we identify and segregate real videos from manipulated ones. The holy grail in this context has to do with zero-shot detection, i.e., generalizable detection, which we focus on in this work. FauxNet consistently outperforms the state-of-the-art in this setting. In addition, FauxNet is able to attribute - distinguish between generation techniques from which the videos stem. Finally, we propose new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning