The Impact of Silence on Speech Anti-Spoofing
Yuxiang Zhang, Zhuo Li, Jingze Lu, Hua Hua, Wenchao Wang, Pengyuan, Zhang

TL;DR
This paper investigates how silence affects speech anti-spoofing systems, revealing that silence content and duration significantly influence detection accuracy, and proposes methods to improve robustness against silence-related attacks.
Contribution
The study analyzes the impact of silence on anti-spoofing models, visualizes attention distribution, and proposes masking silence to enhance robustness against unknown spoofing attacks.
Findings
Silence duration is lower in TTS spoof speech compared to bonafide speech.
Removing silence increases error rates in neural TTS spoof detection.
Masking silence improves model robustness against certain spoofing attacks.
Abstract
The current speech anti-spoofing countermeasures (CMs) show excellent performance on specific datasets. However, removing the silence of test speech through Voice Activity Detection (VAD) can severely degrade performance. In this paper, the impact of silence on speech anti-spoofing is analyzed. First, the reasons for the impact are explored, including the proportion of silence duration and the content of silence. The proportion of silence duration in spoof speech generated by text-to-speech (TTS) algorithms is lower than that in bonafide speech. And the content of silence generated by different waveform generators varies compared to bonafide speech. Then the impact of silence on model prediction is explored. Even after retraining, the spoof speech generated by neural network based end-to-end TTS algorithms suffers a significant rise in error rates when the silence is removed. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
