Semantic Codebooks as Effective Priors for Neural Speech Compression

Liuyang Bai; Weiyi Lu; Li Guo

arXiv:2512.21653·cs.SD·December 29, 2025

Semantic Codebooks as Effective Priors for Neural Speech Compression

Liuyang Bai, Weiyi Lu, Li Guo

PDF

Open Access

TL;DR

SemDAC introduces a semantic-aware neural speech codec that leverages semantic codebooks as priors, significantly improving compression efficiency and downstream recognition performance at lower bitrates.

Contribution

The paper presents SemDAC, a novel neural speech codec that uses semantic codebooks derived from HuBERT features to enhance compression and recognition accuracy.

Findings

01

Outperforms DAC on perceptual metrics

02

Achieves lower WER on Whisper recognition

03

Operates at substantially lower bitrates (e.g., 0.95 kbps vs. 2.5 kbps)

Abstract

Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on downstream recognition tasks. We propose SemDAC, a semantic-aware neural audio codec that leverages semantic codebooks as effective priors for speech compression. In SemDAC, the first quantizer in a residual vector quantization (RVQ) stack is distilled from HuBERT features to produce semantic tokens that capture phonetic content, while subsequent quantizers model residual acoustics. A FiLM-conditioned decoder reconstructs audio conditioned on the semantic tokens, improving efficiency in the use of acoustic codebooks. Despite its simplicity, this design proves highly effective: SemDAC outperforms DAC across perceptual metrics and achieves lower WER…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques · Speech and Audio Processing