ADEPT: A Dataset for Evaluating Prosody Transfer
Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis,, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao,, Simon King

TL;DR
This paper introduces ADEPT, a new dataset designed to evaluate prosody transfer in text-to-speech systems, providing a benchmark and methodology for measuring how well models replicate nuanced prosodic features.
Contribution
The paper presents ADEPT, a comprehensive dataset with prosodic variations and an evaluation framework for assessing prosody transfer in TTS models, addressing the lack of standardized benchmarks.
Findings
Listeners can distinguish prosodic variations with reasonable accuracy.
The dataset provides benchmark figures for prosody transfer success.
Evaluation of two TTS models demonstrates the dataset's utility.
Abstract
Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it. We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and Audio Processing
