An Investigation of Time-Frequency Representation Discriminators for   High-Fidelity Vocoder

Yicheng Gu; Xueyao Zhang; Liumeng Xue; Haizhou Li; Zhizheng Wu

arXiv:2404.17161·cs.SD·April 29, 2024

An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder

Yicheng Gu, Xueyao Zhang, Liumeng Xue, Haizhou Li, Zhizheng Wu

PDF

Open Access

TL;DR

This paper introduces novel multi-scale discriminators based on CQT and CWT for GAN vocoders, enhancing synthesis quality by better capturing pitch and transient features, applicable to speech and singing voices.

Contribution

It proposes two new multi-scale discriminators using CQT and CWT that improve GAN vocoder performance by modeling dynamic time-frequency features.

Findings

01

CQT discriminator excels in pitch modeling.

02

CWT discriminator captures short-time transients.

03

Combined discriminators improve synthesis quality across models.

Abstract

Generative Adversarial Network (GAN) based vocoders are superior in both inference speed and synthesis quality when reconstructing an audible waveform from an acoustic representation. This study focuses on improving the discriminator for GAN-based vocoders. Most existing Time-Frequency Representation (TFR)-based discriminators are rooted in Short-Time Fourier Transform (STFT), which owns a constant Time-Frequency (TF) resolution, linearly scaled center frequencies, and a fixed decomposition basis, making it incompatible with signals like singing voices that require dynamic attention for different frequency bands and different time intervals. Motivated by that, we propose a Multi-Scale Sub-Band Constant-Q Transform CQT (MS-SB-CQT) discriminator and a Multi-Scale Temporal-Compressed Continuous Wavelet Transform CWT (MS-TC-CWT) discriminator. Both CQT and CWT have a dynamic TF resolution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensor Technology and Measurement Systems