AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice   Conversion

Damien Ronssin; Milos Cernak

arXiv:2111.06601·eess.AS·November 15, 2021

AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

Damien Ronssin, Milos Cernak

PDF

TL;DR

This paper introduces AC-VC, a low-latency, non-parallel voice conversion system using phonetic posteriorgrams that achieves real-time performance with minimal future context, matching baseline naturalness but with some speaker similarity trade-offs.

Contribution

The paper proposes a novel almost causal voice conversion system with only 57.5 ms look-ahead, enabling real-time application while maintaining high naturalness.

Findings

01

Achieves naturalness comparable to non-causal baseline (MOS 3.5).

02

Maintains real-time processing with minimal future context (57.5 ms).

03

Lower speaker similarity (65%) compared to state-of-the-art systems.

Abstract

This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57.5 ms future look-ahead. The complete system is composed of three neural networks trained separately with non-parallel data. While most of the current voice conversion systems focus primarily on quality irrespective of algorithmic latency, this work elaborates on designing a method using a minimal amount of future context thus allowing a future real-time implementation. According to a subjective listening test organized in this work, the proposed AC-VC system achieves parity with the non-causal ASR-TTS baseline of the Voice Conversion Challenge 2020 in naturalness with a MOS of 3.5. In contrast, the results indicate that missing future context impacts speaker similarity. Obtained similarity percentage of 65%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.