The HCCL-DKU system for fake audio generation task of the 2022 ICASSP   ADD Challenge

Ziyi Chen; Hua Hua; Yuxiang Zhang; Ming Li; Pengyuan Zhang

arXiv:2201.12567·cs.SD·February 1, 2022

The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge

Ziyi Chen, Hua Hua, Yuxiang Zhang, Ming Li, Pengyuan Zhang

PDF

Open Access

TL;DR

This paper introduces a novel end-to-end PPG-based voice conversion system that significantly improves fake audio generation quality and spoofing success, achieving second place in the 2022 ICASSP ADD challenge.

Contribution

The paper presents a new fully end-to-end PPG-based voice conversion model that outperforms existing models in quality and spoofing effectiveness.

Findings

01

Outperforms Tacotron and Fastspeech models in quality

02

Achieves high spoofing success rate of 0.916

03

Secures second place in the ICASSP ADD challenge

Abstract

The voice conversion task is to modify the speaker identity of continuous speech while preserving the linguistic content. Generally, the naturalness and similarity are two main metrics for evaluating the conversion quality, which has been improved significantly in recent years. This paper presents the HCCL-DKU entry for the fake audio generation task of the 2022 ICASSP ADD challenge. We propose a novel ppg-based voice conversion model that adopts a fully end-to-end structure. Experimental results show that the proposed method outperforms other conversion models, including Tacotron-based and Fastspeech-based models, on conversion quality and spoofing performance against anti-spoofing systems. In addition, we investigate several post-processing methods for better spoofing power. Finally, we achieve second place with a deception success rate of 0.916 in the ADD challenge.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing