Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach
Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu

TL;DR
This paper introduces a deep generative model-based speech coding method that uses vector quantization of MFCCs and WaveGlow, achieving high-quality wideband speech at low bit-rates with improved efficiency and quality over traditional codecs.
Contribution
It presents a novel deep learning-based speech codec combining VQ of MFCCs and WaveGlow, enabling scalable, high-quality wideband speech coding at low bit-rates.
Findings
Outperforms classic MELPe codec at lower bit-rates
Provides better speech quality on TIMIT corpus
Operates efficiently without autoregressive sampling
Abstract
Traditional low bit-rate speech coding approach only handles narrowband speech at 8kHz, which limits further improvements in speech quality. Motivated by recent successful exploration of deep learning methods for image and speech compression, this paper presents a new approach through vector quantization (VQ) of mel-frequency cepstral coefficients (MFCCs) and using a deep generative model called WaveGlow to provide efficient and high-quality speech coding. The coding feature is sorely an 80-dimension MFCCs vector for 16kHz wideband speech, then speech coding at the bit-rate throughout 1000-2000 bit/s could be scalably implemented by applying different VQ schemes for MFCCs vector. This new deep generative network based codec works fast as the WaveGlow model abandons the sample-by-sample autoregressive mechanism. We evaluated this new approach over the multi-speaker TIMIT corpus, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
MethodsAffine Coupling · Normalizing Flows · Invertible 1x1 Convolution · WaveGlow
