MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion   Enhanced Taylor Transformer for U-Net-based Speech Enhancement

Zizhen Lin; Xiaoting Chen; Junyu Wang

arXiv:2406.04589·cs.SD·September 18, 2024

MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement

Zizhen Lin, Xiaoting Chen, Junyu Wang

PDF

Open Access 1 Repo

TL;DR

This paper presents MUSE, a lightweight speech enhancement model using a novel MET Transformer with flexible receptive fields and multi-path fusion, achieving high performance with minimal parameters.

Contribution

Introduces a Multi-path Enhanced Taylor Transformer with Deformable Embedding and attention fusion for efficient speech enhancement within a U-net architecture.

Findings

01

Achieves competitive speech enhancement performance

02

Reduces model size to 0.51M parameters

03

Demonstrates lower training and deployment costs

Abstract

Achieving a balance between lightweight design and high performance remains a challenging task for speech enhancement. In this paper, we introduce Multi-path Enhanced Taylor (MET) Transformer based U-net for Speech Enhancement (MUSE), a lightweight speech enhancement network built upon the Unet architecture. Our approach incorporates a novel Multi-path Enhanced Taylor (MET) Transformer block, which integrates Deformable Embedding (DE) to enable flexible receptive fields for voiceprints. The MET Transformer is uniquely designed to fuse Channel and Spatial Attention (CSA) branches, facilitating channel information exchange and addressing spatial attention deficits within the Taylor-Transformer framework. Through extensive experiments conducted on the VoiceBank+DEMAND dataset, we demonstrate that MUSE achieves competitive performance while significantly reducing both training and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huaidanquede/MUSE-Speech-Enhancement
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques