EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech   Annotations

Weizhen Bian; Yubo Zhou; Kaitai Zhang; Xiaohan Gu

arXiv:2412.06581·cs.SD·December 13, 2024

EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations

Weizhen Bian, Yubo Zhou, Kaitai Zhang, Xiaohan Gu

PDF

Open Access

TL;DR

This paper introduces EmoSpeech, a richly annotated emotional speech database created using a generative model to extract and describe speech segments, enabling more nuanced emotional control in TTS systems.

Contribution

It presents a novel automated method for building emotionally rich speech databases with detailed natural language annotations, reducing manual effort and increasing emotional granularity.

Findings

01

Enhanced emotional granularity in speech database

02

Reduced reliance on manual annotations

03

Scalable and cost-effective data augmentation

Abstract

Advances in text-to-speech (TTS) technology have significantly improved the quality of generated speech, closely matching the timbre and intonation of the target speaker. However, due to the inherent complexity of human emotional expression, the development of TTS systems capable of controlling subtle emotional differences remains a formidable challenge. Existing emotional speech databases often suffer from overly simplistic labelling schemes that fail to capture a wide range of emotional states, thus limiting the effectiveness of emotion synthesis in TTS applications. To this end, recent efforts have focussed on building databases that use natural language annotations to describe speech emotions. However, these approaches are costly and require more emotional depth to train robust systems. In this paper, we propose a novel process aimed at building databases by systematically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems