SPG-Codec: Exploring the Role and Boundaries of Semantic Priors in Ultra-Low-Bitrate Neural Speech Coding
Mingyu Zhao, Zijian Lin, Kun Wei, Zhiyong Wu

TL;DR
This paper investigates the integration of semantic priors in neural speech coding at ultra-low bitrates, revealing their limits and trade-offs, and proposes a dynamic regulation strategy for improved speech quality and robustness.
Contribution
It systematically analyzes the role of semantic priors like HuBERT and Whisper in speech coding, introduces the Semantic Retirement phenomenon, and proposes a bitrate-aware adjustment method.
Findings
Semantic constraints reduce Word Error Rate by up to 10% at 1.5 kbps.
Benefits of semantic priors diminish beyond 6 kbps, indicating a capacity boundary.
High-level linguistic priors reduce hallucinations and improve generalization in noisy environments.
Abstract
Conventional neural speech codecs suffer from severe intelligibility degradation at ultra-low bitrates, where the bottleneck transitions from acoustic distortion to semantic loss. To address this issue, this paper conducts a systematic investigation into the role and fundamental limits of integrating frozen semantic priors -- specifically HuBERT and Whisper -- into neural speech coding. We introduce and quantitatively validate a novel Semantic Retirement phenomenon: while semantic constraints reduce the Word Error Rate (WER) by up to ~10% relatively at 1.5 kbps, their benefits rapidly diminish beyond 6 kbps, indicating a practical capacity boundary. We further uncover a clear trade-off between different prior types: acoustic-rich priors (HuBERT) better preserve prosodic and timbral details, whereas high-level linguistic priors (Whisper) effectively suppress phonetic hallucinations in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
