Mathematical Vocoder Algorithm : Modified Spectral Inversion for   Efficient Neural Speech Synthesis

Hyun Gon Ryu; Jeong-Hoon Kim; Simon See

arXiv:2106.03167·eess.AS·June 17, 2021

Mathematical Vocoder Algorithm : Modified Spectral Inversion for Efficient Neural Speech Synthesis

Hyun Gon Ryu, Jeong-Hoon Kim, Simon See

PDF

Open Access

TL;DR

This paper introduces a novel mathematical vocoder algorithm using modified spectral inversion that synthesizes high-fidelity speech efficiently without neural network training, enabling fast, language-agnostic speech synthesis.

Contribution

It presents a non-data-driven vocoder method that bypasses neural training, allowing rapid, high-quality speech synthesis applicable across languages and voices.

Findings

01

Synthesizes speech at 20 MHz on CPU and 59.6 MHz on GPU

02

Achieves real-time speed improvements of 909x (CPU) and 2702x (GPU)

03

Applicable to unseen voices and multiple languages without retraining

Abstract

In this work, we propose a new mathematical vocoder algorithm(modified spectral inversion) that generates a waveform from acoustic features without phase estimation. The main benefit of using our proposed method is that it excludes the training stage of the neural vocoder from the end-to-end speech synthesis model. Our implementation can synthesize high fidelity speech at approximately 20 Mhz on CPU and 59.6MHz on GPU. This is 909 and 2,702 times faster compared to real-time. Since the proposed methodology is not a data-driven method, it is applicable to unseen voices and multiple languages without any additional work. The proposed method is expected to adapt for researching on neural network models capable of synthesizing speech at the studio recording level.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing