BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

Detai Xin; Xu Tan; Shinnosuke Takamichi; Hiroshi Saruwatari

arXiv:2409.05377·eess.AS·September 10, 2024

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

Detai Xin, Xu Tan, Shinnosuke Takamichi, Hiroshi Saruwatari

PDF

Open Access 1 Repo 1 Models

TL;DR

BigCodec is a large-scale neural speech codec that significantly improves low-bitrate speech quality by scaling model size, integrating sequential models, and optimizing quantization, outperforming existing codecs at similar bitrates.

Contribution

We introduce BigCodec, a neural speech codec with 159M parameters that surpasses existing low-bitrate codecs through model scaling and architectural innovations.

Findings

01

Outperforms existing low-bitrate codecs in objective and subjective evaluations.

02

Achieves comparable performance to higher-bitrate codecs.

03

Provides better perceptual quality than the ground truth.

Abstract

We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently restricts performance, other factors, such as model capacity, also hinder further improvements. To address this problem, we scale up the model size to 159M parameters that is more than 10 times larger than popular codecs with about 10M parameters. Besides, we integrate sequential models into traditional convolutional architectures to better capture temporal dependency and adopt low-dimensional vector quantization to ensure a high code utilization. Comprehensive objective and subjective evaluations show that BigCodec, with a bitrate of 1.04 kbps, significantly outperforms several existing low-bitrate codecs. Furthermore, BigCodec achieves objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Aria-K-Alethia/BigCodec
pytorchOfficial

Models

🤗
neuphonic/distill-neucodec
model· 65k dl· ♡ 19
65k dl♡ 19

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis