Speech Denoising in the Waveform Domain with Self-Attention

Zhifeng Kong; Wei Ping; Ambrish Dantrey; Bryan Catanzaro

arXiv:2202.07790·cs.SD·July 8, 2022

Speech Denoising in the Waveform Domain with Self-Attention

Zhifeng Kong, Wei Ping, Ambrish Dantrey, Bryan Catanzaro

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces CleanUNet, a causal waveform-based speech denoising model utilizing self-attention and multi-resolution losses, achieving superior speech quality over existing methods.

Contribution

The paper presents a novel encoder-decoder model with self-attention for waveform speech denoising, improving over prior models in quality and effectiveness.

Findings

01

Outperforms state-of-the-art denoising models on multiple metrics

02

Uses self-attention to refine bottleneck representations

03

Optimized with multi-resolution spectrogram losses

Abstract

In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics. We release our code and models at https://github.com/nvidia/cleanunet.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nvidia/cleanunet
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques