DGC-vector: A new speaker embedding for zero-shot voice conversion

Ruitong Xiao; Haitong Zhang; Yue Lin

arXiv:2203.09722·cs.SD·March 21, 2022

DGC-vector: A new speaker embedding for zero-shot voice conversion

Ruitong Xiao, Haitong Zhang, Yue Lin

PDF

TL;DR

This paper introduces DGC-vector, a novel speaker embedding method that enhances zero-shot voice conversion by combining D-vector, GST, and auxiliary supervision, leading to improved speaker similarity.

Contribution

The paper proposes a new speaker embedding technique that outperforms existing methods in zero-shot voice conversion.

Findings

01

Significant improvement in speaker similarity over D-vector and GST-based embeddings.

02

Effective combination of D-vector, GST, and auxiliary supervision enhances representation.

03

Achieved decent performance in zero-shot voice conversion tasks.

Abstract

Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the speaker similarity in zero-shot voice conversion, we propose a novel speaker representation method in this paper. Our method combines the advantages of D-vector, global style token (GST) based speaker representation and auxiliary supervision. Objective and subjective evaluations show that the proposed method achieves a decent performance on zero-shot voice conversion and significantly improves speaker similarity over D-vector and GST-based speaker embedding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.