CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

Zhifeng Kong; Wei Ping; Ambrish Dantrey; Bryan Catanzaro

arXiv:2309.05975·cs.LG·September 13, 2023

CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram

Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

PDF

TL;DR

CleanUNet 2 introduces a hybrid speech denoising approach combining waveform and spectrogram models, leading to superior denoising performance through a two-stage framework inspired by speech synthesis techniques.

Contribution

It presents a novel two-stage speech denoising model that integrates waveform and spectrogram denoisers, enhancing denoising effectiveness over existing methods.

Findings

01

Outperforms previous speech denoising methods in objective metrics

02

Achieves better subjective audio quality in evaluations

03

Demonstrates the effectiveness of combining waveform and spectrogram models

Abstract

In this work, we present CleanUNet 2, a speech denoising model that combines the advantages of waveform denoiser and spectrogram denoiser and achieves the best of both worlds. CleanUNet 2 uses a two-stage framework inspired by popular speech synthesis methods that consist of a waveform model and a spectrogram model. Specifically, CleanUNet 2 builds upon CleanUNet, the state-of-the-art waveform denoiser, and further boosts its performance by taking predicted spectrograms from a spectrogram denoiser as the input. We demonstrate that CleanUNet 2 outperforms previous methods in terms of various objective and subjective evaluations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.