Residual Speech Embeddings for Tone Classification: Removing Linguistic Content to Enhance Paralinguistic Analysis
Hamdan Al Ahbabi, Gautier Marti, Saeed AlMarri, Ibrahim Elfadel

TL;DR
This paper presents a novel residual embedding method that disentangles linguistic content from speech representations, significantly improving tone classification accuracy and enabling better analysis of paralinguistic features.
Contribution
The authors introduce a residual embedding technique that removes linguistic content from self-supervised speech embeddings, enhancing tone classification and paralinguistic analysis.
Findings
Residual embeddings improve tone classification accuracy.
Linear separability of tone features is enhanced.
Linguistic content is effectively removed while preserving tone information.
Abstract
Self-supervised learning models for speech processing, such as wav2vec2, HuBERT, WavLM, and Whisper, generate embeddings that capture both linguistic and paralinguistic information, making it challenging to analyze tone independently of spoken content. In this work, we introduce a method for disentangling paralinguistic features from linguistic content by regressing speech embeddings onto their corresponding text embeddings and using the residuals as a representation of vocal tone. We evaluate this approach across multiple self-supervised speech embeddings, demonstrating that residual embeddings significantly improve tone classification performance compared to raw speech embeddings. Our results show that this method enhances linear separability, enabling improved classification even with simple models such as logistic regression. Visualization of the residual embeddings further confirms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Sentiment Analysis and Opinion Mining
