NU-GAN: High resolution neural upsampling with GAN

Rithesh Kumar; Kundan Kumar; Vicki Anand; Yoshua Bengio; Aaron; Courville

arXiv:2010.11362·cs.SD·October 23, 2020·22 cites

NU-GAN: High resolution neural upsampling with GAN

Rithesh Kumar, Kundan Kumar, Vicki Anand, Yoshua Bengio, Aaron, Courville

PDF

Open Access

TL;DR

NU-GAN introduces a GAN-based method for high-resolution audio upsampling, significantly improving the quality of low to high sampling rate conversion in speech synthesis applications.

Contribution

The paper presents NU-GAN, a novel GAN-based approach specifically designed for high-resolution audio upsampling in speech synthesis pipelines.

Findings

01

Resamples 22 kHz audio to 44.1 kHz with minimal perceptual difference

02

Achieves ABX preference scores only slightly above random chance

03

Effective for both single and multi-speaker datasets

Abstract

In this paper, we propose NU-GAN, a new method for resampling audio from lower to higher sampling rates (upsampling). Audio upsampling is an important problem since productionizing generative speech technology requires operating at high sampling rates. Such applications use audio at a resolution of 44.1 kHz or 48 kHz, whereas current speech synthesis methods are equipped to handle a maximum of 24 kHz resolution. NU-GAN takes a leap towards solving audio upsampling as a separate component in the text-to-speech (TTS) pipeline by leveraging techniques for audio generation using GANs. ABX preference tests indicate that our NU-GAN resampler is capable of resampling 22 kHz to 44.1 kHz audio that is distinguishable from original audio only 7.4% higher than random chance for single speaker dataset, and 10.8% higher than chance for multi-speaker dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing