Time Domain Adversarial Voice Conversion for ADD 2022

Cheng Wen; Tingwei Guo; Xingjun Tan; Rui Yan; Shuran Zhou; Chuandong; Xie; Wei Zou; Xiangang Li

arXiv:2204.08692·eess.AS·April 21, 2022

Time Domain Adversarial Voice Conversion for ADD 2022

Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong, Xie, Wei Zou, Xiangang Li

PDF

Open Access

TL;DR

This paper presents a time domain adversarial voice conversion system for the ADD 2022 challenge, capable of generating convincing fake speech that can deceive anti-spoofing detectors while maintaining good audio quality.

Contribution

The authors introduce a novel time domain post-processing step for voice conversion that enhances deception ability against anti-spoofing detectors.

Findings

01

System ranks top in ADD 2022 Track 3.1

02

Demonstrates strong adversarial ability against anti-spoofing detectors

03

Maintains acceptable audio quality and speaker similarity

Abstract

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022). Firstly, we build an any-to-many voice conversion (VC) system to convert source speech with arbitrary language content into the target speaker%u2019s fake speech. Then the converted speech generated from VC is post-processed in the time domain to improve the deception ability. The experimental results show that our system has adversarial ability against anti-spoofing detectors with a little compromise in audio quality and speaker similarity. This system ranks top in Track 3.1 in the ADD 2022, showing that our method could also gain good generalization ability against different detectors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing