BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec
Detai Xin, Xu Tan, Shinnosuke Takamichi, Hiroshi Saruwatari

TL;DR
BigCodec is a large-scale neural speech codec that significantly improves low-bitrate speech quality by scaling model size, integrating sequential models, and optimizing quantization, outperforming existing codecs at similar bitrates.
Contribution
We introduce BigCodec, a neural speech codec with 159M parameters that surpasses existing low-bitrate codecs through model scaling and architectural innovations.
Findings
Outperforms existing low-bitrate codecs in objective and subjective evaluations.
Achieves comparable performance to higher-bitrate codecs.
Provides better perceptual quality than the ground truth.
Abstract
We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently restricts performance, other factors, such as model capacity, also hinder further improvements. To address this problem, we scale up the model size to 159M parameters that is more than 10 times larger than popular codecs with about 10M parameters. Besides, we integrate sequential models into traditional convolutional architectures to better capture temporal dependency and adopt low-dimensional vector quantization to ensure a high code utilization. Comprehensive objective and subjective evaluations show that BigCodec, with a bitrate of 1.04 kbps, significantly outperforms several existing low-bitrate codecs. Furthermore, BigCodec achieves objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
