RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

Neeraj Matiyali; Siddharth Srivastava; Gaurav Sharma

arXiv:2508.17031·cs.SD·August 26, 2025

RephraseTTS: Dynamic Length Text based Speech Insertion with Speaker Style Transfer

Neeraj Matiyali, Siddharth Srivastava, Gaurav Sharma

PDF

TL;DR

RephraseTTS introduces a transformer-based, non-autoregressive method for text-conditioned speech insertion that dynamically determines speech length, preserves speaker style, and outperforms existing adaptive TTS baselines.

Contribution

It is the first to enable variable-length speech insertion conditioned on text and partial speech, maintaining speaker style and prosody during insertion.

Findings

01

Outperforms existing adaptive TTS baselines in experiments

02

Capable of dynamic speech length determination during inference

03

Produces high-quality speech insertions as confirmed by user study

Abstract

We propose a method for the task of text-conditioned speech insertion, i.e. inserting a speech sample in an input speech sample, conditioned on the corresponding complete text transcript. An example use case of the task would be to update the speech audio when corrections are done on the corresponding text transcript. The proposed method follows a transformer-based non-autoregressive approach that allows speech insertions of variable lengths, which are dynamically determined during inference, based on the text transcript and tempo of the available partial input. It is capable of maintaining the speaker's voice characteristics, prosody and other spectral properties of the available speech input. Results from our experiments and user study on LibriTTS show that our method outperforms baselines based on an existing adaptive text to speech method. We also provide numerous qualitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.