HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection
Qing Wen, Haohao Li, Zhongjie Ba, Peng Cheng, Miao He, Li Lu, Kui Ren

TL;DR
HyperPotter introduces a hypergraph-based approach to model high-order interactions in audio deepfake detection, significantly improving accuracy and generalization across multiple datasets and attack types.
Contribution
It is the first to explicitly model high-order interactions in audio deepfake detection using hypergraphs with clustering-based hyperedges.
Findings
Achieves an average of 22.15% relative gain over baseline.
Outperforms state-of-the-art methods by 13.96% on cross-domain datasets.
Demonstrates strong generalization to diverse attacks and speakers.
Abstract
Advances in AIGC technologies have enabled the synthesis of highly realistic audio deepfakes capable of deceiving human auditory perception. Although numerous audio deepfake detection (ADD) methods have been developed, most rely on local temporal/spectral features or pairwise relations, overlooking high-order interactions (HOIs). HOIs capture discriminative patterns that emerge from multiple feature components beyond their individual contributions. We propose HyperPotter, a hypergraph-based framework that explicitly models these synergistic HOIs through clustering-based hyperedges with class-aware prototype initialization. Extensive experiments demonstrate that HyperPotter surpasses its baseline by an average relative gain of 22.15% across 11 datasets and outperforms state-of-the-art methods by 13.96% on 4 challenging cross-domain datasets, demonstrating superior generalization to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Digital Media Forensic Detection · Emotion and Mood Recognition
