Speech Editing -- a Summary
Tobias K\"assmann, Yining Liu, Danni Liu

TL;DR
This paper reviews recent advancements in text-based speech editing techniques that modify audio seamlessly through transcript modifications, emphasizing improved quality, key metrics, and ongoing challenges to inspire future research.
Contribution
It provides a comprehensive review of state-of-the-art speech editing methods, comparing metrics and datasets, and discusses recent innovations like context-aware prosody correction.
Findings
Recent methods achieve high-quality, indistinguishable speech edits.
Advancements include context-aware prosody correction and attention mechanisms.
The paper identifies ongoing challenges and future research directions.
Abstract
With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by altering the mel-spectrogram. Recent advancements, such as context-aware prosody correction and advanced attention mechanisms, have improved speech editing quality. This paper reviews state-of-the-art methods, compares key metrics, and examines widely used datasets. The aim is to highlight ongoing issues and inspire further research and innovation in speech editing.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
