PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

Changi Hong; Yoonah Song; Hwayoung Park; Chaewoon Bang; Dayeon Ku; Do Hyun Lee; and Hong Kook Kim

arXiv:2604.09111·eess.AS·May 5, 2026

PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

Changi Hong, Yoonah Song, Hwayoung Park, Chaewoon Bang, Dayeon Ku, Do Hyun Lee, and Hong Kook Kim

PDF

TL;DR

This paper introduces PS-TTS and PS-Comet TTS, innovative methods for improving lip-sync and semantic accuracy in automated multilingual dubbing using phonetic synchronization and paraphrasing.

Contribution

It presents novel synchronization techniques combining paraphrasing, dynamic time warping, and semantic considerations to enhance naturalness and accuracy in AI-based dubbing systems.

Findings

01

Both systems outperform baseline TTS in objective metrics.

02

PS-Comet achieves better lip-sync and semantic preservation across languages.

03

Experiments confirm cross-linguistic applicability of the methods.

Abstract

Recently, artificial intelligence-based dubbing technology has advanced, enabling automated dubbing (AD) to convert the source speech of a video into target speech in different languages. However, natural AD still faces synchronization challenges such as duration and lip-synchronization (lip-sync), which are crucial for preserving the viewer experience. Therefore, this paper proposes a synchronization method for AD processes that paraphrases translated text, comprising two steps: isochrony for timing constraints and phonetic synchronization (PS) to preserve lip-sync. First, we achieve isochrony by paraphrasing the translated text with a language model, ensuring the target speech duration matches that of the source speech. Second, we introduce PS, which employs dynamic time warping (DTW) with local costs of vowel distances measured from training data so that the target text composes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.