RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform
Youxuan Ma, Zongze Ren, Shugong Xu

TL;DR
This paper introduces RW-Resnet, a novel speech anti-spoofing model that processes raw waveforms with Conv1D Resblocks and Resnet34, demonstrating superior detection of synthetic speech attacks on the ASVspoof2019 LA dataset.
Contribution
The paper presents a new raw waveform-based anti-spoofing model combining Conv1D Resblocks and Resnet34, outperforming existing methods in synthetic speech detection.
Findings
Achieves better performance than state-of-the-art models
Effectively detects synthetic speech attacks
Utilizes raw waveform for feature extraction
Abstract
In recent years, synthetic speech generated by advanced text-to-speech (TTS) and voice conversion (VC) systems has caused great harms to automatic speaker verification (ASV) systems, urging us to design a synthetic speech detection system to protect ASV systems. In this paper, we propose a new speech anti-spoofing model named ResWavegram-Resnet (RW-Resnet). The model contains two parts, Conv1D Resblocks and backbone Resnet34. The Conv1D Resblock is based on the Conv1D block with a residual connection. For the first part, we use the raw waveform as input and feed it to the stacked Conv1D Resblocks to get the ResWavegram. Compared with traditional methods, ResWavegram keeps all the information from the audio signal and has a stronger ability in extracting features. For the second part, the extracted features are fed to the backbone Resnet34 for the spoofed or bonafide decision. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
