FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder
Rubing Shen, Yanzhen Ren, Zongkun Sun

TL;DR
FA-GAN is a novel GAN-based vocoder that significantly reduces spectral artifacts and improves speech fidelity by introducing anti-aliased upsampling and phase-aware loss functions, leading to higher quality synthesized speech especially for unseen speakers.
Contribution
The paper proposes FA-GAN, a new vocoder with anti-aliased modules and phase-aware loss, addressing spectral artifacts and enhancing high-fidelity speech synthesis.
Findings
FA-GAN outperforms existing methods in speech quality.
It effectively reduces spectral artifacts.
It generalizes well to unseen speakers.
Abstract
Generative adversarial network (GAN) based vocoders have achieved significant attention in speech synthesis with high quality and fast inference speed. However, there still exist many noticeable spectral artifacts, resulting in the quality decline of synthesized speech. In this work, we adopt a novel GAN-based vocoder designed for few artifacts and high fidelity, called FA-GAN. To suppress the aliasing artifacts caused by non-ideal upsampling layers in high-frequency components, we introduce the anti-aliased twin deconvolution module in the generator. To alleviate blurring artifacts and enrich the reconstruction of spectral details, we propose a novel fine-grained multi-resolution real and imaginary loss to assist in the modeling of phase information. Experimental results reveal that FA-GAN outperforms the compared approaches in promoting audio quality and alleviating spectral…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Digital Filter Design and Implementation
MethodsSoftmax · Attention Is All You Need
