RWEN-TTS: Relation-aware Word Encoding Network for Natural   Text-to-Speech Synthesis

Shinhyeok Oh; HyeongRae Noh; Yoonseok Hong; Insoo Oh

arXiv:2212.07939·cs.CL·December 16, 2022

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis

Shinhyeok Oh, HyeongRae Noh, Yoonseok Hong, Insoo Oh

PDF

Open Access 1 Repo 1 Video

TL;DR

RWEN-TTS introduces a relation-aware encoding network that effectively incorporates syntactic, semantic, and adjacency information to significantly improve naturalness in text-to-speech synthesis.

Contribution

It proposes a novel relation-aware encoding network with modules for semantic and adjacent word relations, addressing limitations of previous TTS models.

Findings

01

Significant improvement over previous TTS models.

02

Effective utilization of syntactic and semantic information.

03

Enhanced naturalness and expressiveness in synthesized speech.

Abstract

With the advent of deep learning, a huge number of text-to-speech (TTS) models which produce human-like speech have emerged. Recently, by introducing syntactic and semantic information w.r.t the input text, various approaches have been proposed to enrich the naturalness and expressiveness of TTS models. Although these strategies showed impressive results, they still have some limitations in utilizing language information. First, most approaches only use graph networks to utilize syntactic and semantic information without considering linguistic features. Second, most previous works do not explicitly consider adjacent words when encoding syntactic and semantic information, even though it is obvious that adjacent words are usually meaningful when encoding the current word. To address these issues, we propose Relation-aware Word Encoding Network (RWEN), which effectively allows syntactic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shinhyeokoh/rwen
pytorchOfficial

Videos

RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis