MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial   Learning

Mohammad Reza Hasanabadi Majid Behdad Davood Gharavian

arXiv:2306.12785·cs.SD·October 26, 2023

MFCCGAN: A Novel MFCC-Based Speech Synthesizer Using Adversarial Learning

Mohammad Reza Hasanabadi Majid Behdad Davood Gharavian

PDF

1 Repo

TL;DR

MFCCGAN introduces an adversarial learning-based speech synthesizer that converts MFCC features into high-quality raw speech waveforms, outperforming traditional rule-based methods in intelligibility and naturalness.

Contribution

The paper presents MFCCGAN, a novel GAN-based model that directly synthesizes speech from MFCCs, improving over existing rule-based and neural vocoders in quality and intelligibility.

Findings

01

Outperforms Librosa MFCC-inversion in STOI and NISQA scores

02

Achieves higher intelligibility and naturalness compared to WORLD vocoder

03

Perceptual loss based on STOI enhances speech quality

Abstract

In this paper, we introduce MFCCGAN as a novel speech synthesizer based on adversarial learning that adopts MFCCs as input and generates raw speech waveforms. Benefiting the GAN model capabilities, it produces speech with higher intelligibility than a rule-based MFCC-based speech synthesizer WORLD. We evaluated the model based on a popular intrusive objective speech intelligibility measure (STOI) and quality (NISQA score). Experimental results show that our proposed system outperforms Librosa MFCC- inversion (by an increase of about 26% up to 53% in STOI and 16% up to 78% in NISQA score) and a rise of about 10% in intelligibility and about 4% in naturalness in comparison with conventional rule-based vocoder WORLD that used in the CycleGAN-VC family. However, WORLD needs additional data like F0. Finally, using perceptual loss in discriminators based on STOI could improve the quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohammdreza2020/mfccgan
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.