NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector   Quantization

Zhikang Niu; Sanyuan Chen; Long Zhou; Ziyang Ma; Xie Chen; Shujie Liu

arXiv:2409.12717·eess.AS·September 20, 2024

NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization

Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu

PDF

Open Access

TL;DR

NDVQ introduces a novel vector quantization method using normal distributions to improve robustness and perceptual quality in neural audio codecs, especially under extremely low bandwidth conditions.

Contribution

The paper proposes NDVQ, a distribution-based vector quantization technique that enhances audio codec robustness by explicitly modeling codebook variance, leading to better quality and lower distortion.

Findings

01

NDVQ outperforms existing codecs like EnCodec in audio quality.

02

NDVQ achieves superior zero-shot TTS performance in low bandwidth.

03

The method effectively reduces signal distortion in noisy VQ codebooks.

Abstract

Built upon vector quantization (VQ), discrete audio codec models have achieved great success in audio compression and auto-regressive audio generation. However, existing models face substantial challenges in perceptual quality and signal distortion, especially when operating in extremely low bandwidth, rooted in the sensitivity of the VQ codebook to noise. This degradation poses significant challenges for several downstream tasks, such as codec-based speech synthesis. To address this issue, we propose a novel VQ method, Normal Distribution-based Vector Quantization (NDVQ), by introducing an explicit margin between the VQ codes via learning a variance. Specifically, our approach involves mapping the waveform to a latent space and quantizing it by selecting the most likely normal distribution, with each codebook entry representing a unique normal distribution defined by its mean and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Speech and Audio Processing · Music and Audio Processing