A Benchmark of Dynamical Variational Autoencoders applied to Speech   Spectrogram Modeling

Xiaoyu Bie; Laurent Girin; Simon Leglaive; Thomas Hueber; Xavier; Alameda-Pineda

arXiv:2106.06500·cs.SD·June 15, 2021

A Benchmark of Dynamical Variational Autoencoders applied to Speech Spectrogram Modeling

Xiaoyu Bie, Laurent Girin, Simon Leglaive, Thomas Hueber, Xavier, Alameda-Pineda

PDF

Open Access 1 Repo

TL;DR

This paper benchmarks six Dynamical Variational Autoencoder models on speech spectrograms, demonstrating their potential for effective speech analysis and resynthesis by modeling temporal dependencies in sequential data.

Contribution

It provides a comprehensive experimental comparison of DVAE models applied to speech spectrograms, highlighting their capabilities and potential for speech modeling.

Findings

01

DVAEs outperform traditional VAEs in speech reconstruction.

02

Certain DVAE models better capture temporal dependencies.

03

The benchmark showcases the strengths of DVAEs for speech analysis.

Abstract

The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In recent years, a series of papers have presented different extensions of the VAE to process sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and corresponding latent vectors, relying on recurrent neural networks. We recently performed a comprehensive review of those models and unified them into a general class called Dynamical Variational Autoencoders (DVAEs). In the present paper, we present the results of an experimental benchmark comparing six of those DVAE models on the speech analysis-resynthesis task, as an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

XiaoyuBIE1994/DVAE-speech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing