Phrase break prediction with bidirectional encoder representations in   Japanese text-to-speech synthesis

Kosuke Futamata; Byeongseon Park; Ryuichi Yamamoto; Kentaro Tachibana

arXiv:2104.12395·eess.AS·April 27, 2021

Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis

Kosuke Futamata, Byeongseon Park, Ryuichi Yamamoto, Kentaro Tachibana

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new phrase break prediction method for Japanese TTS that combines features from a pre-trained language model and BiLSTM, improving accuracy and naturalness of synthesized speech.

Contribution

It integrates implicit BERT features with explicit linguistic features in a BiLSTM framework, enhancing phrase break prediction in Japanese TTS systems.

Findings

01

3.2 point improvement in F1 score over traditional BiLSTM methods

02

Achieved a mean opinion score of 4.39 in naturalness, close to ground-truth

03

Demonstrated effective combination of implicit and explicit features for better prediction

Abstract

We propose a novel phrase break prediction method that combines implicit features extracted from a pre-trained large language model, a.k.a BERT, and explicit features extracted from BiLSTM with linguistic features. In conventional BiLSTM based methods, word representations and/or sentence representations are used as independent components. The proposed method takes account of both representations to extract the latent semantics, which cannot be captured by previous methods. The objective evaluation results show that the proposed method obtains an absolute improvement of 3.2 points for the F1 score compared with BiLSTM-based conventional methods using linguistic features. Moreover, the perceptual listening test results verify that a TTS system that applied our proposed method achieved a mean opinion score of 4.39 in prosody naturalness, which is highly competitive with the score of 4.37…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anandaswarup/phrase_break_prediction
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Dense Connections · Tanh Activation · Linear Warmup With Linear Decay · WordPiece · Softmax · Sigmoid Activation