FauForensics: Boosting Audio-Visual Deepfake Detection with Facial Action Units
Jian Wang, Baoyuan Wu, Li Liu, Qingshan Liu

TL;DR
FauForensics introduces a novel audio-visual deepfake detection framework utilizing facial action units and fine-grained frame-wise similarity measures, achieving state-of-the-art results and improved cross-dataset robustness.
Contribution
The paper presents a new biologically inspired facial action unit-based representation and a fusion module for dynamic multimodal alignment, enhancing deepfake detection accuracy and generalization.
Findings
Achieves state-of-the-art detection performance on FakeAVCeleb and LAV-DF datasets.
Demonstrates up to 4.83% improvement over existing methods in cross-dataset tests.
Effectively captures subtle facial muscle dynamics disrupted in deepfakes.
Abstract
The rapid evolution of generative AI has increased the threat of realistic audio-visual deepfakes, demanding robust detection methods. Existing solutions primarily address unimodal (audio or visual) forgeries but struggle with multimodal manipulations due to inadequate handling of heterogeneous modality features and poor generalization across datasets. To this end, we propose a novel framework called FauForensics by introducing biologically invariant facial action units (FAUs), which is a quantitative descriptor of facial muscle activity linked to emotion physiology. It serves as forgery-resistant representations that reduce domain dependency while capturing subtle dynamics often disrupted in synthetic content. Besides, instead of comparing entire video clips as in prior works, our method computes fine-grained frame-wise audiovisual similarities via a dedicated fusion module augmented…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
