Automatic Prosody Prediction for Chinese Speech Synthesis using   BLSTM-RNN and Embedding Features

Chuang Ding; Lei Xie; Jie Yan; Weini Zhang; Yang Liu

arXiv:1511.00360·cs.CL·November 3, 2015·5 cites

Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features

Chuang Ding, Lei Xie, Jie Yan, Weini Zhang, Yang Liu

PDF

Open Access

TL;DR

This paper introduces a neural network approach using BLSTM-RNN and embedding features to improve automatic prosody prediction for Chinese speech synthesis, eliminating the need for feature engineering and outperforming traditional methods.

Contribution

The paper presents a novel neural network-based method that directly predicts prosodic boundaries from Chinese characters using embedding features, surpassing CRF-based approaches.

Findings

01

BLSTM-RNN outperforms CRF in prosody prediction

02

Embedding features improve prediction accuracy

03

Neural network approach reduces feature engineering effort

Abstract

Prosody affects the naturalness and intelligibility of speech. However, automatic prosody prediction from text for Chinese speech synthesis is still a great challenge and the traditional conditional random fields (CRF) based method always heavily relies on feature engineering. In this paper, we propose to use neural networks to predict prosodic boundary labels directly from Chinese characters without any feature engineering. Experimental results show that stacking feed-forward and bidirectional long short-term memory (BLSTM) recurrent network layers achieves superior performance over the CRF-based method. The embedding features learned from raw text further enhance the performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling