QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model
Shaowen Chen, Tomoki Toda

TL;DR
QHARMA-GAN introduces a neural vocoder that combines quasi-harmonic modeling with ARMA functions, enabling high-quality, flexible speech synthesis with reduced computational complexity and improved interpretability.
Contribution
This work presents a novel neural vocoder framework integrating QHM and ARMA models, enhancing speech synthesis quality, flexibility, and efficiency over existing end-to-end neural vocoders.
Findings
Outperforms existing methods in synthesis quality
Achieves faster speech generation speeds
Allows flexible pitch shifting and time stretching
Abstract
Vocoders, encoding speech signals into acoustic features and allowing for speech signal reconstruction from them, have been studied for decades. Recently, the rise of deep learning has particularly driven the development of neural vocoders to generate high-quality speech signals. On the other hand, the existing end-to-end neural vocoders suffer from a black-box nature that blinds the speech production mechanism and the intrinsic structure of speech, resulting in the ambiguity of separately modeling source excitation and resonance characteristics and the loss of flexibly synthesizing or modifying speech with high quality. Moreover, their sequence-wise waveform generation usually requires complicated networks, leading to substantial time consumption. In this work, inspired by the quasi-harmonic model (QHM) that represents speech as sparse components, we combine the neural network and QHM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders
