RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

Youxuan Ma; Zongze Ren; Shugong Xu

arXiv:2108.05684·cs.SD·August 16, 2021·5 cites

RW-Resnet: A Novel Speech Anti-Spoofing Model Using Raw Waveform

Youxuan Ma, Zongze Ren, Shugong Xu

PDF

Open Access

TL;DR

This paper introduces RW-Resnet, a novel speech anti-spoofing model that processes raw waveforms with Conv1D Resblocks and Resnet34, demonstrating superior detection of synthetic speech attacks on the ASVspoof2019 LA dataset.

Contribution

The paper presents a new raw waveform-based anti-spoofing model combining Conv1D Resblocks and Resnet34, outperforming existing methods in synthetic speech detection.

Findings

01

Achieves better performance than state-of-the-art models

02

Effectively detects synthetic speech attacks

03

Utilizes raw waveform for feature extraction

Abstract

In recent years, synthetic speech generated by advanced text-to-speech (TTS) and voice conversion (VC) systems has caused great harms to automatic speaker verification (ASV) systems, urging us to design a synthetic speech detection system to protect ASV systems. In this paper, we propose a new speech anti-spoofing model named ResWavegram-Resnet (RW-Resnet). The model contains two parts, Conv1D Resblocks and backbone Resnet34. The Conv1D Resblock is based on the Conv1D block with a residual connection. For the first part, we use the raw waveform as input and feed it to the stacked Conv1D Resblocks to get the ResWavegram. Compared with traditional methods, ResWavegram keeps all the information from the audio signal and has a stronger ability in extracting features. For the second part, the extracted features are fed to the backbone Resnet34 for the spoofed or bonafide decision. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders