Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces

Farhan Sheth; Girish; Mohd Mujtaba Akhtar; Muskaan Singh

arXiv:2511.10793·eess.AS·November 17, 2025·ACL

Curved Worlds, Clear Boundaries: Generalizing Speech Deepfake Detection using Hyperbolic and Spherical Geometry Spaces

Farhan Sheth, Girish, Mohd Mujtaba Akhtar, Muskaan Singh

PDF

Open Access

TL;DR

This paper introduces RHYME, a geometry-aware framework that uses hyperbolic and spherical spaces to improve generalization in speech deepfake detection across diverse synthesis methods.

Contribution

It proposes a novel non-Euclidean embedding approach that captures shared structural distortions in synthetic speech, enabling better cross-paradigm detection.

Findings

01

RHYME outperforms existing methods in cross-paradigm detection accuracy.

02

Hyperbolic and spherical geometries effectively model hierarchical and angular features.

03

Achieves state-of-the-art results in generalizable speech deepfake detection.

Abstract

In this work, we address the challenge of generalizable audio deepfake detection (ADD) across diverse speech synthesis paradigms-including conventional text-to-speech (TTS) systems and modern diffusion or flow-matching (FM) based generators. Prior work has mostly targeted individual synthesis families and often fails to generalize across paradigms due to overfitting to generation-specific artifacts. We hypothesize that synthetic speech, irrespective of its generative origin, leaves behind shared structural distortions in the embedding space that can be aligned through geometry-aware modeling. To this end, we propose RHYME, a unified detection framework that fuses utterance-level embeddings from diverse pretrained speech encoders using non-Euclidean projections. RHYME maps representations into hyperbolic and spherical manifolds-where hyperbolic geometry excels at modeling hierarchical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis