Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Haoguang Zhou; Siyi Wang; Jingyao Wu; James Bailey; Ting Dang

arXiv:2603.21224·cs.SD·March 24, 2026

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Haoguang Zhou, Siyi Wang, Jingyao Wu, James Bailey, Ting Dang

PDF

Open Access

TL;DR

This paper investigates how discretized speech representations affect emotional information preservation and introduces emotion-aware quantization techniques to enhance emotion retention in compressed speech models.

Contribution

It presents novel emotion-aware quantization methods, including emotion-specific codebooks and Emo-Q, to improve emotion preservation in discrete speech representations.

Findings

01

Aggressive compression degrades emotion, with uneven effects across classes.

02

Emotion-aware quantization improves emotion perception at lower bitrates.

03

Emo-Q enhances emotion recognition performance with lightweight routing.

Abstract

Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech Recognition and Synthesis · Advanced Data Compression Techniques