FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
Luca Della Libera, Francesco Paissan, Cem Subakan, Mirco Ravanelli

TL;DR
FocalCodec is a novel low-bitrate speech codec using focal modulation with a single binary codebook, achieving efficient compression and high-quality speech resynthesis and voice conversion across various conditions.
Contribution
It introduces FocalCodec, a low-bitrate speech codec based on focal modulation that simplifies architecture with a single codebook, outperforming existing methods in efficiency and quality.
Findings
Achieves speech compression between 0.16 and 0.65 kbps.
Performs well in speech resynthesis and voice conversion.
Handles multilingual speech and noisy environments effectively.
Abstract
Large language models have revolutionized natural language processing through self-supervised pretraining on massive datasets. Inspired by this success, researchers have explored adapting these methods to speech by discretizing continuous audio into tokens using neural audio codecs. However, existing approaches face limitations, including high bitrates, the loss of either semantic or acoustic information, and the reliance on multi-codebook designs when trying to capture both, which increases architectural complexity for downstream tasks. To address these challenges, we introduce FocalCodec, an efficient low-bitrate codec based on focal modulation that utilizes a single binary codebook to compress speech between 0.16 and 0.65 kbps. FocalCodec delivers competitive performance in speech resynthesis and voice conversion at lower bitrates than the current state-of-the-art, while effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗lucadellalib/focalcodec_50hzmodel· 1.3k dl· ♡ 11.3k dl♡ 1
- 🤗lucadellalib/focalcodec_25hzmodel· 194 dl· ♡ 1194 dl♡ 1
- 🤗lucadellalib/focalcodec_12_5hzmodel· 532 dl· ♡ 2532 dl♡ 2
- 🤗lucadellalib/focalcodec_50hz_65k_causalmodel· 19 dl19 dl
- 🤗lucadellalib/focalcodec_50hz_4k_causalmodel· 372 dl372 dl
- 🤗lucadellalib/focalcodec_50hz_2k_causalmodel· 1.6k dl1.6k dl
- 🤗lucadellalib/dycastmodel· 86 dl· ♡ 386 dl♡ 3
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Speech Recognition and Synthesis
