ADEPT: A Dataset for Evaluating Prosody Transfer

Alexandra Torresquintero; Tian Huey Teh; Christopher G. R. Wallis,; Marlene Staib; Devang S Ram Mohan; Vivian Hu; Lorenzo Foglianti; Jiameng Gao,; Simon King

arXiv:2106.08321·eess.AS·July 22, 2021·Interspeech

ADEPT: A Dataset for Evaluating Prosody Transfer

Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis,, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao,, Simon King

PDF

Open Access

TL;DR

This paper introduces ADEPT, a new dataset designed to evaluate prosody transfer in text-to-speech systems, providing a benchmark and methodology for measuring how well models replicate nuanced prosodic features.

Contribution

The paper presents ADEPT, a comprehensive dataset with prosodic variations and an evaluation framework for assessing prosody transfer in TTS models, addressing the lack of standardized benchmarks.

Findings

01

Listeners can distinguish prosodic variations with reasonable accuracy.

02

The dataset provides benchmark figures for prosody transfer success.

03

Evaluation of two TTS models demonstrates the dataset's utility.

Abstract

Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it. We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and Audio Processing