Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Zirui Li; Lauri Juvela; Mikko Kurimo

arXiv:2507.02115·eess.AS·February 10, 2026

Pronunciation Editing for Finnish Speech using Phonetic Posteriorgrams

Zirui Li, Lauri Juvela, Mikko Kurimo

PDF

TL;DR

This paper introduces PPG2Speech, a diffusion-based multispeaker model for phoneme-level speech editing in Finnish, enabling high-quality L2 speech synthesis with minimal data and no text alignment, improving speech naturalness and speaker similarity.

Contribution

The paper presents a novel phoneme editing method using Phonetic Posteriorgrams and diffusion models, specifically designed for low-resource languages like Finnish, with new evaluation metrics and techniques.

Findings

01

Effective phoneme editing demonstrated on Finnish with 60 hours of data

02

Improved naturalness and speaker similarity over TTS-based editing

03

Proposed PAC metric correlates well with perceived editing quality

Abstract

Synthesizing second-language (L2) speech is potentially highly valued for L2 language learning experience and feedback. However, due to the lack of L2 speech synthesis datasets, it is difficult to synthesize L2 speech for low-resourced languages. In this paper, we provide a practical solution for editing native speech to approximate L2 speech and present PPG2Speech, a diffusion-based multispeaker Phonetic-Posteriorgrams-to-Speech model that is capable of editing a single phoneme without text alignment. We use Matcha-TTS's flow-matching decoder as the backbone, transforming Phonetic Posteriorgrams (PPGs) to mel-spectrograms conditioned on external speaker embeddings and pitch. PPG2Speech strengthens the Matcha-TTS's flow-matching decoder with Classifier-free Guidance (CFG) and Sway Sampling. We also propose a new task-specific objective evaluation metric, the Phonetic Aligned Consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.