Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning

Xi Xuan; Wenxin Zhang; Zhiyu Li; Jennifer Williams; Ville Hautam\"aki; and Tomi H. Kinnunen

arXiv:2603.21875·eess.AS·March 24, 2026

Disentangling Speaker Traits for Deepfake Source Verification via Chebyshev Polynomial and Riemannian Metric Learning

Xi Xuan, Wenxin Zhang, Zhiyu Li, Jennifer Williams, Ville Hautam\"aki, and Tomi H. Kinnunen

PDF

Open Access

TL;DR

This paper introduces a novel metric learning framework that disentangles speaker traits from source embeddings in deepfake speech verification, utilizing Chebyshev polynomials and Riemannian geometry to improve source discrimination.

Contribution

The paper proposes a new SDML framework with two innovative loss functions that enhance speaker disentanglement and source verification accuracy in deepfake speech detection.

Findings

01

Effective source verification on MLAAD benchmark

02

Improved disentanglement of speaker traits

03

Robust performance under new protocols

Abstract

Speech deepfake source verification systems aims to determine whether two synthetic speech utterances originate from the same source generator, often assuming that the resulting source embeddings are independent of speaker traits. However, this assumption remains unverified. In this paper, we first investigate the impact of speaker factors on source verification. We propose a speaker-disentangled metric learning (SDML) framework incorporating two novel loss functions. The first leverages Chebyshev polynomial to mitigate gradient instability during disentanglement optimization. The second projects source and speaker embeddings into hyperbolic space, leveraging Riemannian metric distances to reduce speaker information and learn more discriminative source features. Experimental results on MLAAD benchmark, evaluated under four newly proposed protocols designed for source-speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Emotion and Mood Recognition