ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder   Conditioned on Mel Spectrogram

Xiao-Hang Jiang; Hui-Peng Du; Yang Ai; Ye-Xin Lu; Zhen-Hua Ling

arXiv:2411.11258·cs.SD·November 19, 2024

ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram

Xiao-Hang Jiang, Hui-Peng Du, Yang Ai, Ye-Xin Lu, Zhen-Hua Ling

PDF

Open Access

TL;DR

ESTVocoder is a neural vocoder that leverages excitation-spectral transformation within source-filter theory, improving speech synthesis quality and convergence speed by incorporating spectral priors and adversarial training.

Contribution

It introduces a novel excitation-spectral transformation neural vocoder based on source-filter theory, enhancing speech quality and training efficiency.

Findings

01

Outperforms or matches baseline neural vocoders in speech quality.

02

Accelerates convergence due to spectral prior in excitation.

03

Maintains reasonable model complexity and speed.

Abstract

This paper proposes ESTVocoder, a novel excitation-spectral-transformed neural vocoder within the framework of source-filter theory. The ESTVocoder transforms the amplitude and phase spectra of the excitation into the corresponding speech amplitude and phase spectra using a neural filter whose backbone is ConvNeXt v2 blocks. Finally, the speech waveform is reconstructed through the inverse short-time Fourier transform (ISTFT). The excitation is constructed based on the F0: for voiced segments, it contains full harmonic information, while for unvoiced segments, it is represented by noise. The excitation provides the filter with prior knowledge of the amplitude and phase patterns, expecting to reduce the modeling difficulty compared to conventional neural vocoders. To ensure the fidelity of the synthesized speech, an adversarial training strategy is applied to ESTVocoder with multi-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications