DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

Jiaqi Li; Xiaolong Lin; Zhekai Li; Shixi Huang; Yuancheng Wang; Chaoren Wang; Zhenpeng Zhan; Zhizheng Wu

arXiv:2505.13000·cs.SD·October 2, 2025

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

Jiaqi Li, Xiaolong Lin, Zhekai Li, Shixi Huang, Yuancheng Wang, Chaoren Wang, Zhenpeng Zhan, Zhizheng Wu

PDF

1 Repo 2 Models

TL;DR

DualCodec is a novel neural audio codec that combines semantic and waveform representations to achieve high-quality speech synthesis at low frame rates, improving efficiency and performance.

Contribution

It introduces a dual-stream encoding framework that enhances semantic information in low-frame-rate codecs, outperforming existing state-of-the-art systems.

Findings

01

Outperforms Mimi Codec, SpeechTokenizer, DAC, and Encodec in experiments.

02

Maintains high audio quality at low frame rates.

03

Enhances semantic content in speech generation.

Abstract

Neural audio codecs form the foundational building blocks for language model (LM)-based speech generation. Typically, there is a trade-off between frame rate and audio quality. This study introduces a low-frame-rate, semantically enhanced codec model. Existing approaches distill semantically rich self-supervised (SSL) representations into the first-layer codec tokens. This work proposes DualCodec, a dual-stream encoding approach that integrates SSL and waveform representations within an end-to-end codec framework. In this setting, DualCodec enhances the semantic information in the first-layer codec and enables the codec system to maintain high audio quality while operating at a low frame rate. Note that a low-frame-rate codec improves the efficiency of speech generation. Experimental results on audio codec and speech generation tasks confirm the effectiveness of the proposed DualCodec…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiaqili3/DualCodec
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.