Multi-interaction TTS toward professional recording reproduction

Hiroki Kanagawa; Kenichi Fujita; Aya Watanabe; Yusuke Ijima

arXiv:2507.00808·cs.SD·July 3, 2025

Multi-interaction TTS toward professional recording reproduction

Hiroki Kanagawa, Kenichi Fujita, Aya Watanabe, Yusuke Ijima

PDF

Open Access

TL;DR

This paper introduces a multi-interaction TTS system that enables users to iteratively refine synthesized speech styles, emulating voice director-actor interactions for more professional recording quality.

Contribution

It presents a novel TTS framework that supports multi-step user interactions for style refinement, filling a gap in existing TTS systems.

Findings

01

Enables iterative style refinement aligned with user directions

02

Demonstrates effective multi-interaction capability through experiments

03

Provides a new dataset for multi-interaction TTS research

Abstract

Voice directors often iteratively refine voice actors' performances by providing feedback to achieve the desired outcome. While this iterative feedback-based refinement process is important in actual recordings, it has been overlooked in text-to-speech synthesis (TTS). As a result, fine-grained style refinement after the initial synthesis is not possible, even though the synthesized speech often deviates from the user's intended style. To address this issue, we propose a TTS method with multi-step interaction that allows users to intuitively and rapidly refine synthesized speech. Our approach models the interaction between the TTS model and its user to emulate the relationship between voice actors and voice directors. Experiments show that the proposed model with its corresponding dataset enables iterative style refinements in accordance with users' directions, thus demonstrating its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Music Technology and Sound Studies