Syntactic representation learning for neural network based TTS with   syntactic parse tree traversal

Changhe Song; Jingbei Li; Yixuan Zhou; Zhiyong Wu; Helen Meng

arXiv:2012.06971·cs.CL·December 15, 2020·1 cites

Syntactic representation learning for neural network based TTS with syntactic parse tree traversal

Changhe Song, Jingbei Li, Yixuan Zhou, Zhiyong Wu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a novel method for automatically learning syntactic representations from parse trees to improve neural TTS systems, resulting in more natural speech synthesis.

Contribution

It proposes a syntactic representation learning approach using parse tree traversal and GRU networks, enhancing prosody and naturalness in TTS without manual feature design.

Findings

01

MOS increased from 3.70 to 3.82

02

ABX preference exceeded baseline by 17%

03

Prosodic differences are perceptible in multi-parse sentences

Abstract

Syntactic structure of a sentence text is correlated with the prosodic structure of the speech that is crucial for improving the prosody and naturalness of a text-to-speech (TTS) system. Nowadays TTS systems usually try to incorporate syntactic structure information with manually designed features based on expert knowledge. In this paper, we propose a syntactic representation learning method based on syntactic parse tree traversal to automatically utilize the syntactic structure information. Two constituent label sequences are linearized through left-first and right-first traversals from constituent parse tree. Syntactic representations are then extracted at word level from each constituent label sequence by a corresponding uni-directional gated recurrent unit (GRU) network. Meanwhile, nuclear-norm maximization loss is introduced to enhance the discriminability and diversity of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling