Cyclostationarity Analysis as a Complement to Self-Supervised Representations for Speech Deepfake Detection

Cemal Hanil\c{c}i; Md Sahidullah; Tomi Kinnunen

arXiv:2603.03921·eess.AS·March 5, 2026

Cyclostationarity Analysis as a Complement to Self-Supervised Representations for Speech Deepfake Detection

Cemal Hanil\c{c}i, Md Sahidullah, Tomi Kinnunen

PDF

Open Access

TL;DR

This paper introduces a cyclostationarity-inspired spectral correlation density feature extraction method that complements self-supervised representations, significantly improving speech deepfake detection accuracy across multiple datasets.

Contribution

It proposes a novel cyclostationary-based acoustic feature extraction framework that captures higher-order spectral dependencies for enhanced speech deepfake detection.

Findings

01

Fusion of SSL and SCD features reduces EER from 8.28% to 0.98% on ASVspoof 2019 LA.

02

SCD features provide complementary discriminative information to existing embeddings.

03

The proposed method improves detection performance on multiple challenging datasets.

Abstract

Speech deepfake detection (SDD) is essential for maintaining trust in voice-driven technologies and digital media. Although recent SDD systems increasingly rely on self-supervised learning (SSL) representations that capture rich contextual information, complementary signal-driven acoustic features remain important for modeling fine-grained structural properties of speech. Most existing acoustic front ends are based on time-frequency representations, which do not fully exploit higher-order spectral dependencies inherent in speech signals. We introduce a cyclostationarity-inspired acoustic feature extraction framework for SDD based on spectral correlation density (SCD). The proposed features model periodic statistical structures in speech by capturing spectral correlations between frequency components. In particular, we propose temporally structured SCD features that characterize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Generative Adversarial Networks and Image Synthesis