Cyclostationarity Analysis as a Complement to Self-Supervised Representations for Speech Deepfake Detection
Cemal Hanil\c{c}i, Md Sahidullah, Tomi Kinnunen

TL;DR
This paper introduces a cyclostationarity-inspired spectral correlation density feature extraction method that complements self-supervised representations, significantly improving speech deepfake detection accuracy across multiple datasets.
Contribution
It proposes a novel cyclostationary-based acoustic feature extraction framework that captures higher-order spectral dependencies for enhanced speech deepfake detection.
Findings
Fusion of SSL and SCD features reduces EER from 8.28% to 0.98% on ASVspoof 2019 LA.
SCD features provide complementary discriminative information to existing embeddings.
The proposed method improves detection performance on multiple challenging datasets.
Abstract
Speech deepfake detection (SDD) is essential for maintaining trust in voice-driven technologies and digital media. Although recent SDD systems increasingly rely on self-supervised learning (SSL) representations that capture rich contextual information, complementary signal-driven acoustic features remain important for modeling fine-grained structural properties of speech. Most existing acoustic front ends are based on time-frequency representations, which do not fully exploit higher-order spectral dependencies inherent in speech signals. We introduce a cyclostationarity-inspired acoustic feature extraction framework for SDD based on spectral correlation density (SCD). The proposed features model periodic statistical structures in speech by capturing spectral correlations between frequency components. In particular, we propose temporally structured SCD features that characterize the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis
