Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Ziqian Dai; Jianwei Yu; Yan Wang; Nuo Chen; Yanyao Bian; Guangzhi Li,; Deng Cai; Dong Yu

arXiv:2206.07956·cs.SD·June 17, 2022

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Ziqian Dai, Jianwei Yu, Yan Wang, Nuo Chen, Yanyao Bian, Guangzhi Li,, Deng Cai, Dong Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural model that automatically extracts prosodic boundary labels from text-audio data, improving TTS naturalness and reducing manual annotation efforts.

Contribution

The paper presents a novel pre-trained text-speech neural model that automatically annotates prosodic boundaries, outperforming text-only methods and matching human annotation quality.

Findings

01

Model outperforms text-only baselines

02

Automatic annotations are comparable to human labels

03

TTS trained with model annotations performs slightly better

Abstract

Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of naturalness and readability. However, the acquisition of prosodic boundary labels relies on manual annotation, which is costly and time-consuming. In this paper, we propose to automatically extract prosodic boundary labels from text-audio data via a neural text-speech model with pre-trained audio encoders. This model is pre-trained on text and speech data separately and jointly fine-tuned on TTS data in a triplet format: {speech, text, prosody}. The experimental results on both automatic evaluation and human evaluation demonstrate that: 1) the proposed text-speech prosody annotation framework significantly outperforms text-only baselines; 2) the quality of automatic prosodic boundary annotations is comparable to human annotations; 3) TTS systems trained with model-annotated boundaries are slightly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

daisyqk/automatic-prosody-annotation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems