EMO-BOOST: Emotion-Augmented Audio-Visual Features for Improved Generalization in Deepfake Detection
Aritra Marik, Marcel Klemt, Anna Rohrbach

TL;DR
This paper introduces Emo-Boost, a multimodal deepfake detection framework that leverages emotion recognition to improve generalization to unseen deepfake manipulations.
Contribution
It proposes a novel emotion-augmented detection method that combines high-level semantic cues with low-level features for better generalization.
Findings
Emo-Boost improves cross-manipulation generalization AUC by 2.1% on FakeAVCeleb.
Emotion cues complement low-level features, enhancing deepfake detection.
The framework fuses audio-visual emotion recognition with existing detectors.
Abstract
With every advancement in generative AI models, forensics is under increasing pressure. The constant emergence of new generation techniques makes it impossible to collect data for each manipulation to train a deepfake detection model. Thus, generalizing to deepfakes unseen during training is one of the major challenges in current deepfake detection research. To tackle this challenge, we employ high-level semantic cues and argue that these cues can support low-level focused approaches in generalizing to unseen types of manipulations. In this work, we study emotions as a high-level semantic cue. We propose Emo-Boost, a multimodal deepfake detection framework that fuses an off-the-shelf RGB- and acoustic-focused deepfake detector with our emotion-based deepfake detector EmoForensics. EmoForensics utilises vision and audio emotion recognition modules and models intra- and inter-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
