Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie; Jiangqun Ni; Jian Zhang; Bin Zhang; Weizhe Zhang; Bin Li

arXiv:2511.19080·cs.MM·November 25, 2025

Towards Generalizable Deepfake Detection via Forgery-aware Audio-Visual Adaptation: A Variational Bayesian Approach

Fan Nie, Jiangqun Ni, Jian Zhang, Bin Zhang, Weizhe Zhang, Bin Li

PDF

Open Access

TL;DR

This paper introduces FoVB, a novel variational Bayesian framework for multi-modal deepfake detection that models audio-visual correlations as latent variables, improving generalization and detection accuracy.

Contribution

It proposes a forgery-aware audio-visual adaptation method using variational Bayes to better capture cross-modal inconsistencies in deepfake detection.

Findings

01

Outperforms state-of-the-art methods on multiple benchmarks.

02

Effectively models audio-visual correlations as Gaussian latent variables.

03

Enhances detection accuracy and generalization across diverse deepfake datasets.

Abstract

The widespread application of AIGC contents has brought not only unprecedented opportunities, but also potential security concerns, e.g., audio-visual deepfakes. Therefore, it is of great importance to develop an effective and generalizable method for multi-modal deepfake detection. Typically, the audio-visual correlation learning could expose subtle cross-modal inconsistencies, e.g., audio-visual misalignment, which serve as crucial clues in deepfake detection. In this paper, we reformulate the correlation learning with variational Bayesian estimation, where audio-visual correlation is approximated as a Gaussian distributed latent variable, and thus develop a novel framework for deepfake detection, i.e., Forgery-aware Audio-Visual Adaptation with Variational Bayes (FoVB). Specifically, given the prior knowledge of pre-trained backbones, we adopt two core designs to estimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning