A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer

Himanshu Maurya; Atli Sigurgeirsson

arXiv:2406.06601·cs.CL·June 12, 2024

A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer

Himanshu Maurya, Atli Sigurgeirsson

PDF

Open Access

TL;DR

This paper introduces a human-in-the-loop method to enhance cross-text prosody transfer in TTS systems, allowing users to adjust prosody for better text appropriateness, leading to more natural speech synthesis.

Contribution

The paper presents a novel human-in-the-loop approach that improves cross-text prosody transfer by incorporating user adjustments, addressing limitations of existing models.

Findings

01

Human adjustments increase appropriateness ratings to 57.8%

02

Limited user effort yields significant prosody improvements

03

Latent reference space closeness is unreliable for prosodic similarity

Abstract

Text-To-Speech (TTS) prosody transfer models can generate varied prosodic renditions, for the same text, by conditioning on a reference utterance. These models are trained with a reference that is identical to the target utterance. But when the reference utterance differs from the target text, as in cross-text prosody transfer, these models struggle to separate prosody from text, resulting in reduced perceived naturalness. To address this, we propose a Human-in-the-Loop (HitL) approach. HitL users adjust salient correlates of prosody to make the prosody more appropriate for the target text, while maintaining the overall reference prosodic effect. Human adjusted renditions maintain the reference prosody while being rated as more appropriate for the target text $57.8%$ of the time. Our analysis suggests that limited user effort suffices for these improvements, and that closeness in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Speech Recognition and Synthesis · Phonetics and Phonology Research