Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincar\'e sphere

Mingru Yang; Yanmei Gu; Qianhua He; Yanxiong Li; Peirong Zhang; Yongqiang Chen; Zhiming Wang; Huijia Zhu; Jian Liu; Weiqiang Wang

arXiv:2508.01897·cs.SD·August 5, 2025

Generalizable Audio Deepfake Detection via Hierarchical Structure Learning and Feature Whitening in Poincar\'e sphere

Mingru Yang, Yanmei Gu, Qianhua He, Yanxiong Li, Peirong Zhang, Yongqiang Chen, Zhiming Wang, Huijia Zhu, Jian Liu, Weiqiang Wang

PDF

Open Access

TL;DR

This paper introduces Poin-HierNet, a novel framework for audio deepfake detection that constructs hierarchical, domain-invariant representations in the Poincaré sphere to improve generalization across diverse attacks and domains.

Contribution

The paper proposes a new hierarchical learning framework in the Poincaré sphere, combining prototype learning, structure learning, and feature whitening for robust audio deepfake detection.

Findings

01

Outperforms state-of-the-art methods on multiple datasets

02

Achieves lower Equal Error Rate across diverse conditions

03

Demonstrates strong domain generalization capabilities

Abstract

Audio deepfake detection (ADD) faces critical generalization challenges due to diverse real-world spoofing attacks and domain variations. However, existing methods primarily rely on Euclidean distances, failing to adequately capture the intrinsic hierarchical structures associated with attack categories and domain factors. To address these issues, we design a novel framework Poin-HierNet to construct domain-invariant hierarchical representations in the Poincar\'e sphere. Poin-HierNet includes three key components: 1) Poincar\'e Prototype Learning (PPL) with several data prototypes aligning sample features and capturing multilevel hierarchies beyond human labels; 2) Hierarchical Structure Learning (HSL) leverages top prototypes to establish a tree-like hierarchical structure from data prototypes; and 3) Poincar\'e Feature Whitening (PFW) enhances domain invariance by applying feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Adversarial Robustness in Machine Learning