Complex-valued neural networks for voice anti-spoofing
Nicolas M. M\"uller, Philip Sperl, Konstantin B\"ottinger

TL;DR
This paper introduces complex-valued neural networks that process complex frequency representations of audio, retaining phase information and improving voice anti-spoofing detection while enabling explainability.
Contribution
It proposes a novel complex-valued neural network approach that combines magnitude and phase information for improved anti-spoofing detection and interpretability.
Findings
Outperforms previous methods on the 'In-the-Wild' dataset
Retains phase information for better audio naturalness
Enables explainable AI techniques for model interpretation
Abstract
Current anti-spoofing and audio deepfake detection systems use either magnitude spectrogram-based features (such as CQT or Melspectrograms) or raw audio processed through convolution or sinc-layers. Both methods have drawbacks: magnitude spectrograms discard phase information, which affects audio naturalness, and raw-feature-based models cannot use traditional explainable AI methods. This paper proposes a new approach that combines the benefits of both methods by using complex-valued neural networks to process the complex-valued, CQT frequency-domain representation of the input audio. This method retains phase information and allows for explainable AI methods. Results show that this approach outperforms previous methods on the "In-the-Wild" anti-spoofing dataset and enables interpretation of the results through explainable AI. Ablation studies confirm that the model has learned to use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
MethodsConvolution
