Disentangled Feature Learning for Real-Time Neural Speech Coding

Xue Jiang; Xiulian Peng; Yuan Zhang; Yan Lu

arXiv:2211.11960·cs.SD·February 28, 2023·1 cites

Disentangled Feature Learning for Real-Time Neural Speech Coding

Xue Jiang, Xiulian Peng, Yuan Zhang, Yan Lu

PDF

Open Access

TL;DR

This paper introduces a novel neural speech coding method that learns disentangled global and local features, improving coding efficiency and enabling real-time voice conversion with less parameters and latency.

Contribution

It proposes a disentangled feature learning approach for neural speech coding, enhancing efficiency and enabling real-time audio editing like voice conversion.

Findings

01

Achieves better coding efficiency through feature disentanglement.

02

Enables real-time voice conversion with fewer parameters.

03

Demonstrates comparable performance to state-of-the-art models.

Abstract

Recently end-to-end neural audio/speech coding has shown its great potential to outperform traditional signal analysis based audio codecs. This is mostly achieved by following the VQ-VAE paradigm where blind features are learned, vector-quantized and coded. In this paper, instead of blind end-to-end learning, we propose to learn disentangled features for real-time neural speech coding. Specifically, more global-like speaker identity and local content features are learned with disentanglement to represent speech. Such a compact feature decomposition not only achieves better coding efficiency by exploiting bit allocation among different features but also provides the flexibility to do audio editing in embedding space, such as voice conversion in real-time communications. Both subjective and objective results demonstrate its coding efficiency and we find that the learned disentangled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsVQ-VAE