Complex-valued neural networks for voice anti-spoofing

Nicolas M. M\"uller; Philip Sperl; Konstantin B\"ottinger

arXiv:2308.11800·cs.SD·August 24, 2023

Complex-valued neural networks for voice anti-spoofing

Nicolas M. M\"uller, Philip Sperl, Konstantin B\"ottinger

PDF

Open Access

TL;DR

This paper introduces complex-valued neural networks that process complex frequency representations of audio, retaining phase information and improving voice anti-spoofing detection while enabling explainability.

Contribution

It proposes a novel complex-valued neural network approach that combines magnitude and phase information for improved anti-spoofing detection and interpretability.

Findings

01

Outperforms previous methods on the 'In-the-Wild' dataset

02

Retains phase information for better audio naturalness

03

Enables explainable AI techniques for model interpretation

Abstract

Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsConvolution