Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis

Frederik Rautenberg; Jana Wiechmann; Petra Wagner; Reinhold Haeb-Umbach

arXiv:2602.14686·eess.AS·February 17, 2026

Disentangling Pitch and Creak for Speaker Identity Preservation in Speech Synthesis

Frederik Rautenberg, Jana Wiechmann, Petra Wagner, Reinhold Haeb-Umbach

PDF

Open Access

TL;DR

This paper presents a speech synthesis system that effectively separates pitch and creak to modify voice quality without losing speaker identity, using a novel disentanglement approach with normalizing flows.

Contribution

It introduces a new method for disentangling pitch and creak in speech synthesis, enhancing speaker identity preservation during voice quality modifications.

Findings

01

Improved speaker verification accuracy across various creak manipulation levels.

02

Effective disentanglement of pitch and creak in speech synthesis.

03

Demonstrated robustness of the method in preserving speaker identity.

Abstract

We introduce a system capable of faithfully modifying the perceptual voice quality of creak while preserving the speaker's perceived identity. While it is well known that high creak probability is typically correlated with low pitch, it is important to note that this is a property observed on a population of speakers but does not necessarily hold across all situations. Disentanglement of pitch from creak is achieved by augmentation of the training dataset of a speech synthesis system with a speaker manipulation block based on conditional continuous normalizing flow. The experiments show greatly improved speaker verification performance over a range of creak manipulation strengths.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders