Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders, Yuanchao Li, Korin Richmond, Simon King

TL;DR
This paper introduces Segmentation-Variant Codebooks (SVCs) that quantize speech at different linguistic levels to better preserve prosodic and paralinguistic features in speech compression and synthesis tasks.
Contribution
The paper proposes a novel segmentation-variant codebook approach that factorizes speech into multiple segment-specific streams, improving preservation of paralinguistic information.
Findings
SVCs outperform traditional codebooks in preserving prosodic features.
Pooling before discretization enhances segment-level information retention.
Resynthesis with SVCs improves style and quality while maintaining intelligibility.
Abstract
Quantization in SSL speech models (e.g., HuBERT) improves compression and performance in tasks like language modeling, resynthesis, and text-to-speech but often discards prosodic and paralinguistic information (e.g., emotion, prominence). While increasing codebook size mitigates some loss, it inefficiently raises bitrates. We propose Segmentation-Variant Codebooks (SVCs), which quantize speech at distinct linguistic units (frame, phone, word, utterance), factorizing it into multiple streams of segment-specific discrete features. Our results show that SVCs are significantly more effective at preserving prosodic and paralinguistic information across probing tasks. Additionally, we find that pooling before rather than after discretization better retains segment-level information. Resynthesis experiments further confirm improved style realization and slightly improved quality while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
