The Affective Bridge: Preserving Speech Representations while Enhancing Deepfake Detection vian emotional Constraints

Yupei Li; Chenyang Lyu; Longyue Wang; Weihua Luo; Kaifu Zhang; Bj\"orn W. Schuller

arXiv:2512.11241·cs.SD·February 26, 2026

The Affective Bridge: Preserving Speech Representations while Enhancing Deepfake Detection vian emotional Constraints

Yupei Li, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper introduces an emotion-guided training framework for speech deepfake detection that enhances discriminative cues while preserving original speech semantics, leading to improved detection accuracy.

Contribution

It proposes a novel, feature-agnostic, and non-destructive training method using emotion as a bridging constraint to improve speech deepfake detection.

Findings

01

Up to 6% accuracy improvement on FakeOrReal

02

Up to 2% accuracy improvement on IntheWild

03

Reductions in equal error rate

Abstract

Speech deepfake detection (DFD) has benefited from diverse acoustic and semantic speech representations, many of which encode valuable speech information and are costly to train. Existing approaches typically enhance DFD by tuning the representations or applying post-hoc classification on frozen features, limiting control over improving discriminative DF cues without distorting original semantics. We find that emotion is encoded across diverse speech features and correlates with DFD. Therefore, we introduce a unified, feature-agnostic, and non-destructive training framework that uses emotion as a bridging constraint to guide speech features toward DFD, treating emotion recognition as a representation alignment objective rather than an auxiliary task, while preserving the original semantic information. Experiments on FakeOrReal and IntheWild show accuracy improvements of up to 6\% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Sentiment Analysis and Opinion Mining