Learning and controlling the source-filter representation of speech with   a variational autoencoder

Samir Sadok; Simon Leglaive; Laurent Girin; Xavier Alameda-Pineda,; Renaud S\'eguier

arXiv:2204.07075·cs.SD·March 22, 2023

Learning and controlling the source-filter representation of speech with a variational autoencoder

Samir Sadok, Simon Leglaive, Laurent Girin, Xavier Alameda-Pineda,, Renaud S\'eguier

PDF

1 Repo

TL;DR

This paper demonstrates how a variational autoencoder trained on speech data can naturally learn orthogonal subspaces corresponding to source-filter speech components, enabling independent control and analysis of speech features like pitch and formants.

Contribution

The work shows that source-filter speech representations emerge as orthogonal subspaces in a VAE's latent space, allowing for unsupervised learning and independent manipulation of speech features.

Findings

01

Latent subspaces for $f_0$ and formants are orthogonal.

02

Proposed method accurately controls speech features without labeled data.

03

Introduced a robust $f_0$ estimation technique using latent space projections.

Abstract

Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_{0}$ and the formants are of primary importance. In this work, we start from a variational autoencoder (VAE) trained in an unsupervised manner on a large dataset of unlabeled natural speech signals, and we show that the source-filter model of speech production naturally arises as orthogonal subspaces of the VAE latent space. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we propose a method to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samsad35/source-filter-vae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.