FluentEditor: Text-based Speech Editing by Considering Acoustic and   Prosody Consistency

Rui Liu; Jiatian Xi; Ziyue Jiang; Haizhou Li

arXiv:2309.11725·cs.SD·September 25, 2023

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

FluentEditor is a novel text-based speech editing model that enhances fluency and naturalness by incorporating acoustic and prosody consistency constraints during training.

Contribution

It introduces a fluency-aware training criterion with acoustic and prosody constraints to improve speech editing quality.

Findings

01

Outperforms baselines in naturalness and fluency

02

Achieves better acoustic and prosody consistency

03

Demonstrates effectiveness on VCTK dataset

Abstract

Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself. Despite much progress in neural network-based TSE techniques, the current techniques have focused on reducing the difference between the generated speech segment and the reference target in the editing region, ignoring its local and global fluency in the context and original utterance. To maintain the speech fluency, we propose a fluency speech editing model, termed \textit{FluentEditor}, by considering fluency-aware training criterion in the TSE training. Specifically, the \textit{acoustic consistency constraint} aims to smooth the transition between the edited region and its neighboring acoustic segments consistent with the ground truth, while the \textit{prosody consistency constraint} seeks to ensure that the prosody…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-s2-lab/fluenteditor
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques