On Investigation of Unsupervised Speech Factorization Based on   Normalization Flow

Haoran Sun; Yunqi Cai; Lantian Li; Dong Wang

arXiv:1910.13288·cs.SD·October 30, 2019·1 cites

On Investigation of Unsupervised Speech Factorization Based on Normalization Flow

Haoran Sun, Yunqi Cai, Lantian Li, Dong Wang

PDF

Open Access

TL;DR

This paper explores an unsupervised speech factorization method using normalization flow models, demonstrating that speech can be decomposed into meaningful factors like phonetic content and speaker traits within a latent space.

Contribution

It introduces a novel application of normalization flow for speech factorization, revealing properties of the latent space that facilitate disentangling speech attributes.

Findings

01

Latent space exhibits denseness and pseudo linearity.

02

Phonetic content and speaker traits are represented as specific directions.

03

Preliminary results on TIMIT show promising factorization properties.

Abstract

Speech signals are complex composites of various information, including phonetic content, speaker traits, channel effect, etc. Decomposing this complicated mixture into independent factors, i.e., speech factorization, is fundamentally important and plays the central role in many important algorithms of modern speech processing tasks. In this paper, we present a preliminary investigation on unsupervised speech factorization based on the normalization flow model. This model constructs a complex invertible transform, by which we can project speech segments into a latent code space where the distribution is a simple diagonal Gaussian. Our preliminary investigation on the TIMIT database shows that this code space exhibits favorable properties such as denseness and pseudo linearity, and perceptually important factors such as phonetic content and speaker trait can be represented as particular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing