EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy   and Neural Codec Models With Subjective and Objective Evaluations

Wenze Ren; Yi-Cheng Lin; Huang-Cheng Chou; Haibin Wu; Yi-Chiao Wu,; Chi-Chun Lee; Hung-yi Lee; Yu Tsao

arXiv:2407.15458·eess.AS·July 31, 2024

EMO-Codec: An In-Depth Look at Emotion Preservation capacity of Legacy and Neural Codec Models With Subjective and Objective Evaluations

Wenze Ren, Yi-Cheng Lin, Huang-Cheng Chou, Haibin Wu, Yi-Chiao Wu,, Chi-Chun Lee, Hung-yi Lee, Yu Tsao

PDF

TL;DR

This study evaluates how well neural and legacy speech codecs preserve emotional information using subjective and objective assessments, revealing limitations in emotion retention, especially across languages and certain emotions.

Contribution

It provides a comprehensive analysis of emotion preservation in codecs, highlighting the impact of training data and resynthesis on emotional content retention.

Findings

01

Neural codecs show limited emotion preservation, especially in Chinese.

02

Resynthesis degrades speech emotion recognition accuracy.

03

Human tests confirm emotion loss in codec processing.

Abstract

The neural codec model reduces speech data transmission delay and serves as the foundational tokenizer for speech language models (speech LMs). Preserving emotional information in codecs is crucial for effective communication and context understanding. However, there is a lack of studies on emotion loss in existing codecs. This paper evaluates neural and legacy codecs using subjective and objective methods on emotion datasets like IEMOCAP. Our study identifies which codecs best preserve emotional information under various bitrate scenarios. We found that training codec models with both English and Chinese data had limited success in retaining emotional information in Chinese. Additionally, resynthesizing speech through these codecs degrades the performance of speech emotion recognition (SER), particularly for emotions like sadness, depression, fear, and disgust. Human listening tests…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.