Semantic Codebooks as Effective Priors for Neural Speech Compression
Liuyang Bai, Weiyi Lu, Li Guo

TL;DR
SemDAC introduces a semantic-aware neural speech codec that leverages semantic codebooks as priors, significantly improving compression efficiency and downstream recognition performance at lower bitrates.
Contribution
The paper presents SemDAC, a novel neural speech codec that uses semantic codebooks derived from HuBERT features to enhance compression and recognition accuracy.
Findings
Outperforms DAC on perceptual metrics
Achieves lower WER on Whisper recognition
Operates at substantially lower bitrates (e.g., 0.95 kbps vs. 2.5 kbps)
Abstract
Speech codecs are traditionally optimized for waveform fidelity, allocating bits to preserve acoustic detail even when much of it can be inferred from linguistic structure. This leads to inefficient compression and suboptimal performance on downstream recognition tasks. We propose SemDAC, a semantic-aware neural audio codec that leverages semantic codebooks as effective priors for speech compression. In SemDAC, the first quantizer in a residual vector quantization (RVQ) stack is distilled from HuBERT features to produce semantic tokens that capture phonetic content, while subsequent quantizers model residual acoustics. A FiLM-conditioned decoder reconstructs audio conditioned on the semantic tokens, improving efficiency in the use of acoustic codebooks. Despite its simplicity, this design proves highly effective: SemDAC outperforms DAC across perceptual metrics and achieves lower WER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Advanced Data Compression Techniques · Speech and Audio Processing
