Speech waveform synthesis from MFCC sequences with generative   adversarial networks

Lauri Juvela; Bajibabu Bollepalli; Xin Wang; Hirokazu Kameoka; and Manu Airaksinen; Junichi Yamagishi; Paavo Alku

arXiv:1804.00920·eess.AS·April 4, 2018

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, and Manu Airaksinen, Junichi Yamagishi, Paavo Alku

PDF

1 Repo

TL;DR

This paper presents a novel approach to synthesize high-quality speech from MFCC sequences by combining neural prediction, spectral conversion, and GAN-based noise modeling, enabling realistic speech reconstruction from limited features.

Contribution

It introduces a new method that enables speech synthesis solely from MFCCs using a combination of neural networks and GANs, which was previously considered infeasible.

Findings

01

High-quality speech can be reconstructed from MFCCs.

02

The GAN-based noise model improves naturalness of synthesized speech.

03

The method outperforms traditional approaches in speech quality metrics.

Abstract

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network -based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ljuvela/ResGAN
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.