DGC-vector: A new speaker embedding for zero-shot voice conversion
Ruitong Xiao, Haitong Zhang, Yue Lin

TL;DR
This paper introduces DGC-vector, a novel speaker embedding method that enhances zero-shot voice conversion by combining D-vector, GST, and auxiliary supervision, leading to improved speaker similarity.
Contribution
The paper proposes a new speaker embedding technique that outperforms existing methods in zero-shot voice conversion.
Findings
Significant improvement in speaker similarity over D-vector and GST-based embeddings.
Effective combination of D-vector, GST, and auxiliary supervision enhances representation.
Achieved decent performance in zero-shot voice conversion tasks.
Abstract
Recently, more and more zero-shot voice conversion algorithms have been proposed. As a fundamental part of zero-shot voice conversion, speaker embeddings are the key to improving the converted speech's speaker similarity. In this paper, we study the impact of speaker embeddings on zero-shot voice conversion performance. To better represent the characteristics of the target speaker and improve the speaker similarity in zero-shot voice conversion, we propose a novel speaker representation method in this paper. Our method combines the advantages of D-vector, global style token (GST) based speaker representation and auxiliary supervision. Objective and subjective evaluations show that the proposed method achieves a decent performance on zero-shot voice conversion and significantly improves speaker similarity over D-vector and GST-based speaker embedding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
