MANNER: Multi-view Attention Network for Noise Erasure

Hyun Joon Park; Byung Ha Kang; Wooseok Shin; Jin Sob Kim; Sung Won Han

arXiv:2203.02181·eess.AS·March 7, 2022

MANNER: Multi-view Attention Network for Noise Erasure

Hyun Joon Park, Byung Ha Kang, Wooseok Shin, Jin Sob Kim, Sung Won Han

PDF

Open Access 1 Repo

TL;DR

MANNER is a novel multi-view attention network that enhances noisy speech in the time domain, achieving state-of-the-art results by efficiently extracting multiple representations for noise erasure.

Contribution

It introduces a multi-view attention block within a convolutional encoder-decoder for improved noise removal in speech, addressing limitations of previous dual-path models.

Findings

01

Achieves state-of-the-art performance on VoiceBank-DEMAND dataset.

02

Efficiently processes noisy speech with high-quality output.

03

Outperforms existing methods in objective speech quality metrics.

Abstract

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

winddori2002/MANNER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques