Voice Conversion With Just Nearest Neighbors

Matthew Baas; Benjamin van Niekerk; Herman Kamper

arXiv:2305.18975·eess.AS·May 31, 2023·1 cites

Voice Conversion With Just Nearest Neighbors

Matthew Baas, Benjamin van Niekerk, Herman Kamper

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces kNN-VC, a simple and effective voice conversion method that uses nearest neighbor search on self-supervised representations, achieving comparable speaker similarity to complex models with greater simplicity.

Contribution

The paper presents a novel, straightforward voice conversion approach using nearest neighbors on self-supervised features, simplifying the process while maintaining quality.

Findings

01

Improves speaker similarity over baseline methods

02

Maintains similar intelligibility scores

03

Simplifies the voice conversion pipeline

Abstract

Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference. Recent methods produce convincing conversions, but at the cost of increased complexity -- making results difficult to reproduce and build on. Instead, we keep it simple. We propose k-nearest neighbors voice conversion (kNN-VC): a straightforward yet effective method for any-to-any conversion. First, we extract self-supervised representations of the source and reference speech. To convert to the target speaker, we replace each frame of the source representation with its nearest neighbor in the reference. Finally, a pretrained vocoder synthesizes audio from the converted representation. Objective and subjective evaluations show that kNN-VC improves speaker similarity with similar intelligibility scores to existing methods. Code, samples, trained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bshall/knn-vc
pytorchOfficial

Models

🤗
fierce-cats/beatrice-trainer
model· ♡ 39
♡ 39

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing