WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection

Xi Xuan; Davide Carbone; Wenxin Zhang; Ruchi Pandey; Tomi H. Kinnunen

arXiv:2602.02980·eess.AS·May 1, 2026

WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection

Xi Xuan, Davide Carbone, Wenxin Zhang, Ruchi Pandey, Tomi H. Kinnunen

PDF

1 Repo

TL;DR

This paper introduces the WST-X series, a wavelet scattering transform-based feature extractor for speech deepfake detection, combining interpretability and high-level information capture, leading to superior performance on multiple benchmarks.

Contribution

The WST-X series is a novel feature extraction method that merges the advantages of hand-crafted and SSL features using wavelet scattering transforms for improved deepfake detection.

Findings

01

WST-X outperforms existing front-ends on Deepfake-Eval-2024 benchmark.

02

Small averaging scale ($J$) and high-frequency resolutions ($Q$, $L$) are key for detecting subtle artifacts.

03

Stable, translation-invariant features are crucial for effective speech deepfake detection.

Abstract

In this work, we focus on front-end design for speech deepfake detectors, the component that determines the discriminative acoustic cues provided to the classifier. Existing approaches are primarily categorized into two types. Hand-crafted filterbank features are transparent but limited in capturing higher-level information. SSL features, in turn, lack interpretability and may overlook fine-grained spectral anomalies. We propose the WST-X series, a novel family of feature extractors that combines the best of both worlds via the wavelet scattering transform (WST), which cascades wavelet convolutions with modulus nonlinearities to produce deformation-stable, multi-scale features. Experiments on the recent Deepfake-Eval-2024 benchmark, together with cross-dataset evaluations on the SpoofCeleb and In-the-Wild, show that WST-X outperforms existing front-ends by a wide margin. Our analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xxuan-acoustics/WST-X-Series
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.