A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion   Analysis

Xianhao Wei; Jia Jia; Xiang Li; Zhiyong Wu; Ziyi Wang

arXiv:2309.11849·cs.SD·September 22, 2023

A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis

Xianhao Wei, Jia Jia, Xiang Li, Zhiyong Wu, Ziyi Wang

PDF

Open Access

TL;DR

This paper introduces a discourse-level multi-scale prosodic model for fine-grained emotion analysis, leveraging multi-scale text features and prosodic embeddings to improve expressive speech synthesis and emotional prosody prediction.

Contribution

It proposes a novel D-MPM model that exploits multi-scale discourse-level text to predict prosodic features, enhancing emotional speech synthesis and analysis.

Findings

01

Multi-scale text improves prosodic feature prediction.

02

Discourse-level text enhances speech coherence and user experience.

03

Synthesized speech quality surpasses style transfer in some evaluations.

Abstract

This paper explores predicting suitable prosodic features for fine-grained emotion analysis from the discourse-level text. To obtain fine-grained emotional prosodic features as predictive values for our model, we extract a phoneme-level Local Prosody Embedding sequence (LPEs) and a Global Style Embedding as prosodic speech features from the speech with the help of a style transfer model. We propose a Discourse-level Multi-scale text Prosodic Model (D-MPM) that exploits multi-scale text to predict these two prosodic features. The proposed model can be used to analyze better emotional prosodic features and thus guide the speech synthesis model to synthesize more expressive speech. To quantitatively evaluate the proposed model, we contribute a new and large-scale Discourse-level Chinese Audiobook (DCA) dataset with more than 13,000 utterances annotated sequences to evaluate the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition