QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model

Shaowen Chen; Tomoki Toda

arXiv:2507.01611·eess.AS·July 3, 2025

QHARMA-GAN: Quasi-Harmonic Neural Vocoder based on Autoregressive Moving Average Model

Shaowen Chen, Tomoki Toda

PDF

Open Access

TL;DR

QHARMA-GAN introduces a neural vocoder that combines quasi-harmonic modeling with ARMA functions, enabling high-quality, flexible speech synthesis with reduced computational complexity and improved interpretability.

Contribution

This work presents a novel neural vocoder framework integrating QHM and ARMA models, enhancing speech synthesis quality, flexibility, and efficiency over existing end-to-end neural vocoders.

Findings

01

Outperforms existing methods in synthesis quality

02

Achieves faster speech generation speeds

03

Allows flexible pitch shifting and time stretching

Abstract

Vocoders, encoding speech signals into acoustic features and allowing for speech signal reconstruction from them, have been studied for decades. Recently, the rise of deep learning has particularly driven the development of neural vocoders to generate high-quality speech signals. On the other hand, the existing end-to-end neural vocoders suffer from a black-box nature that blinds the speech production mechanism and the intrinsic structure of speech, resulting in the ambiguity of separately modeling source excitation and resonance characteristics and the loss of flexibly synthesizing or modifying speech with high quality. Moreover, their sequence-wise waveform generation usually requires complicated networks, leading to substantial time consumption. In this work, inspired by the quasi-harmonic model (QHM) that represents speech as sparse components, we combine the neural network and QHM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders