Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
Peiji Yang, Fengping Wang, Yicheng Zhong, Huawei Wei, Zhisheng Wang

TL;DR
This paper introduces MsCodec, a multi-scale neural speech codec that encodes speech at different time scales and uses mutual information loss to improve low-bitrate speech compression performance.
Contribution
MsCodec is a novel multi-scale neural speech codec that decouples speech features by time scale and enhances code diversity with mutual information loss.
Findings
Significant improvement in low-bitrate speech compression quality.
Effective decoupling of speech features across multiple time scales.
Enhanced diversity of speech codes through mutual information loss.
Abstract
Neural speech codecs have demonstrated their ability to compress high-quality speech and audio by converting them into discrete token representations. Most existing methods utilize Residual Vector Quantization (RVQ) to encode speech into multiple layers of discrete codes with uniform time scales. However, this strategy overlooks the differences in information density across various speech features, leading to redundant encoding of sparse information, which limits the performance of these methods at low bitrate. This paper proposes MsCodec, a novel multi-scale neural speech codec that encodes speech into multiple layers of discrete codes, each corresponding to a different time scale. This encourages the model to decouple speech features according to their diverse information densities, consequently enhancing the performance of speech compression. Furthermore, we incorporate mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Neural Networks and Applications · Speech and Audio Processing
