Low-latency Real-time Voice Conversion on CPU

Konstantine Sadov; Matthew Hutter; Asara Near

arXiv:2311.00873·cs.SD·November 3, 2023·1 cites

Low-latency Real-time Voice Conversion on CPU

Konstantine Sadov, Matthew Hutter, Asara Near

PDF

Open Access 1 Repo

TL;DR

This paper introduces LLVC, a low-latency, resource-efficient neural network for real-time voice conversion that operates on consumer CPUs with under 20ms latency, outperforming existing models in speed and resource use.

Contribution

The paper presents LLVC, a novel neural network architecture for real-time voice conversion that achieves the lowest latency and resource usage among open-source models.

Findings

01

Latency under 20ms at 16kHz

02

Runs 2.8x faster than real-time on CPU

03

Lowest resource usage among open-source voice conversion models

Abstract

We adapt the architectures of previous audio manipulation and generation neural networks to the task of real-time any-to-one voice conversion. Our resulting model, LLVC ( $L$ ow-latency $L$ ow-resource $V$ oice $C$ onversion), has a latency of under 20ms at a bitrate of 16kHz and runs nearly 2.8x faster than real-time on a consumer CPU. LLVC uses both a generative adversarial architecture as well as knowledge distillation in order to attain this performance. To our knowledge LLVC achieves both the lowest resource usage as well as the lowest latency of any open-source voice conversion model. We provide open-source samples, code, and pretrained model weights at https://github.com/KoeAI/LLVC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

koeai/llvc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

MethodsKnowledge Distillation