FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit   for Neural Speech Codec

Zhihao Du; Shiliang Zhang; Kai Hu; Siqi Zheng

arXiv:2309.07405·cs.SD·October 10, 2023

FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec

Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng

PDF

Open Access 1 Repo 6 Models

TL;DR

FunCodec is an open-source neural speech codec toolkit that offers reproducible training, easy integration, and state-of-the-art models like SoundStream and Encodec, supporting downstream tasks such as speech recognition.

Contribution

It introduces FunCodec, a unified, reproducible toolkit with pre-trained models and novel frequency-domain codecs, enhancing speech quality and efficiency.

Findings

01

FunCodec outperforms other toolkits in speech reconstruction quality.

02

FreqCodec achieves comparable quality with lower complexity.

03

Pre-trained models are effective for speech recognition and TTS.

Abstract

This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an extension of the open-source speech processing toolkit FunASR. FunCodec provides reproducible training recipes and inference scripts for the latest neural speech codec models, such as SoundStream and Encodec. Thanks to the unified design with FunASR, FunCodec can be easily integrated into downstream tasks, such as speech recognition. Along with FunCodec, pre-trained models are also provided, which can be used for academic or generalized purposes. Based on the toolkit, we further propose the frequency-domain codec models, FreqCodec, which can achieve comparable speech quality with much lower computation and parameter complexity. Experimental results show that, under the same compression ratio, FunCodec can achieve better reconstruction quality compared with other toolkits and released models. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alibaba-damo-academy/funcodec
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Neural Networks and Applications