FreeCodec: A disentangled neural speech codec with fewer tokens

Youqiang Zheng; Weiping Tu; Yueteng Kang; Jie Chen; Yike Zhang; Li Xiao; Yuhong Yang; Long Ma

arXiv:2412.01053·cs.SD·July 1, 2025

FreeCodec: A disentangled neural speech codec with fewer tokens

Youqiang Zheng, Weiping Tu, Yueteng Kang, Jie Chen, Yike Zhang, Li Xiao, Yuhong Yang, Long Ma

PDF

Open Access 1 Repo

TL;DR

FreeCodec introduces a novel neural speech codec that effectively disentangles speech components into timbre, prosody, and content, achieving superior reconstruction and disentanglement with fewer tokens.

Contribution

It proposes a new encoding framework that decomposes speech into distinct components, improving coding efficiency and performance over residual vector quantization methods.

Findings

01

Outperforms existing methods in reconstruction quality.

02

Achieves state-of-the-art disentanglement performance.

03

Demonstrates effectiveness with fewer tokens.

Abstract

Neural speech codecs have gained great attention for their outstanding reconstruction with discrete token representations. It is a crucial component in generative tasks such as speech coding and large language models (LLM). However, most works based on residual vector quantization perform worse with fewer tokens due to low coding efficiency for modeling complex coupled information. In this paper, we propose a neural speech codec named FreeCodec which employs a more effective encoding framework by decomposing intrinsic properties of speech into different components: 1) a global vector is extracted as the timbre information, 2) a prosody encoder with a long stride level is used to model the prosody information, 3) the content information is from a content encoder. Using different training strategies, FreeCodec achieves state-of-the-art performance in reconstruction and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

exercise-book-yq/FreeCodec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Speech Recognition and Synthesis · Speech and Audio Processing