Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise
Cemal Hanilci, Tomi Kinnunen, Md Sahidullah, Aleksandr Sizov

TL;DR
This paper investigates the robustness of synthetic speech detection methods in noisy environments, revealing their vulnerability to noise and exploring feature combinations to improve detection accuracy.
Contribution
It provides a comprehensive analysis of state-of-the-art spoofing detectors under additive noise, highlighting their limitations and proposing fusion strategies for better performance.
Findings
All detectors fail at high SNRs in noisy conditions.
Speech enhancement does not improve detection.
Fusion of features enhances robustness and accuracy.
Abstract
Automatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other biometric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, we analyze the suitability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
