FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec
Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

TL;DR
FunCodec is an open-source neural speech codec toolkit that offers reproducible training, easy integration, and state-of-the-art models like SoundStream and Encodec, supporting downstream tasks such as speech recognition.
Contribution
It introduces FunCodec, a unified, reproducible toolkit with pre-trained models and novel frequency-domain codecs, enhancing speech quality and efficiency.
Findings
FunCodec outperforms other toolkits in speech reconstruction quality.
FreqCodec achieves comparable quality with lower complexity.
Pre-trained models are effective for speech recognition and TTS.
Abstract
This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an extension of the open-source speech processing toolkit FunASR. FunCodec provides reproducible training recipes and inference scripts for the latest neural speech codec models, such as SoundStream and Encodec. Thanks to the unified design with FunASR, FunCodec can be easily integrated into downstream tasks, such as speech recognition. Along with FunCodec, pre-trained models are also provided, which can be used for academic or generalized purposes. Based on the toolkit, we further propose the frequency-domain codec models, FreqCodec, which can achieve comparable speech quality with much lower computation and parameter complexity. Experimental results show that, under the same compression ratio, FunCodec can achieve better reconstruction quality compared with other toolkits and released models. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗alibaba-damo/audio_codec-encodec-en-libritts-16k-nq32ds320-pytorchmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗alibaba-damo/audio_codec-encodec-en-libritts-16k-nq32ds640-pytorchmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗alibaba-damo/audio_codec-encodec-zh_en-general-16k-nq32ds320-pytorchmodel· 6 dl· ♡ 16 dl♡ 1
- 🤗alibaba-damo/audio_codec-encodec-zh_en-general-16k-nq32ds640-pytorchmodel· 4 dl· ♡ 34 dl♡ 3
- 🤗alibaba-damo/audio_codec-freqcodec_magphase-en-libritts-16k-gr1nq32ds320-pytorchmodel· 3 dl3 dl
- 🤗alibaba-damo/audio_codec-freqcodec_magphase-en-libritts-16k-gr8nq32ds320-pytorchmodel· 4 dl· ♡ 14 dl♡ 1
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Neural Networks and Applications
