AVQVC: One-shot Voice Conversion by Vector Quantization with applying   contrastive learning

Huaizhen Tang; Xulong Zhang; Jianzong Wang; Ning Cheng; Jing Xiao

arXiv:2202.10020·cs.SD·February 22, 2022

AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning

Huaizhen Tang, Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

PDF

Open Access

TL;DR

This paper introduces AVQVC, a novel one-shot voice conversion framework that leverages vector quantization and contrastive learning to better disentangle content and timbre, resulting in improved speech quality.

Contribution

It proposes a new training method for VQVC that enhances separation of content and timbre, advancing one-shot voice conversion techniques.

Findings

01

Better separation of content and timbre than previous VQVC methods

02

Improved sound quality of converted speech

03

Effective use of contrastive learning in voice conversion

Abstract

Voice Conversion(VC) refers to changing the timbre of a speech while retaining the discourse content. Recently, many works have focused on disentangle-based learning techniques to separate the timbre and the linguistic content information from a speech signal. Once successful, voice conversion will be feasible and straightforward. This paper proposed a novel one-shot voice conversion framework based on vector quantization voice conversion (VQVC) and AutoVC, called AVQVC. A new training method is applied to VQVC to separate content and timbre information from speech more effectively. The result shows that this approach has better performance than VQVC in separating content and timbre to improve the sound quality of generated speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques