Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor   and Neural Waveform Model

Haoyu Li; Yang Ai; Junichi Yamagishi

arXiv:2011.05038·eess.AS·November 11, 2020

Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model

Haoyu Li, Yang Ai, Junichi Yamagishi

PDF

Open Access

TL;DR

This paper introduces a neural network system that enhances low-quality voice recordings by disentangling channel effects and synthesizing high-quality speech, outperforming existing methods.

Contribution

It proposes a novel encoder-decoder architecture with adversarial training to separate channel factors and improve speech quality in low-quality recordings.

Findings

01

Significantly improves speech quality over baseline systems

02

Effectively disentangles channel characteristics from audio

03

Generates professional-quality speech from low-quality inputs

Abstract

High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an encoder-decoder neural network to automatically enhance low-quality recordings to professional high-quality recordings. To address channel variability, we first filter out the channel characteristics from the original input audio using the encoder network with adversarial training. Next, we disentangle the channel factor from a reference audio. Conditioned on this factor, an auto-regressive decoder is then used to predict the target-environment Mel spectrogram. Finally, we apply a neural vocoder to synthesize the speech waveform. Experimental results show that the proposed system can generate a professional high-quality speech waveform when setting high-quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing