EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training

Doyeop Kwak; Youngjoon Jang; Seongyu Kim; Joon Son Chung

arXiv:2506.16231·eess.AS·February 5, 2026

EDNet: A Versatile Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training

Doyeop Kwak, Youngjoon Jang, Seongyu Kim, Joon Son Chung

PDF

Open Access

TL;DR

EDNet is a flexible speech enhancement framework that adaptively combines masking and mapping techniques using a gating mechanism and employs phase shift-invariant training for improved phase estimation across various distortion types.

Contribution

The paper introduces EDNet, a novel versatile speech enhancement model with a gating mechanism and phase shift-invariant training, capable of handling multiple distortion types without prior assumptions.

Findings

01

Consistently outperforms existing methods across various tasks.

02

Effectively combines masking and mapping for diverse distortions.

03

Improves phase estimation with shift-tolerant training.

Abstract

Speech signals in real-world environments are frequently affected by various distortions such as additive noise, reverberation, and bandwidth limitation, which may appear individually or in combination. Traditional speech enhancement methods typically rely on either masking, which focuses on suppressing non-speech components while preserving observable structure, or mapping, which seeks to recover clean speech through direct transformation of the input. Each approach offers strengths in specific scenarios but may be less effective outside its target conditions. We propose the Erase and Draw Network (EDNet), a versatile speech enhancement framework designed to handle a broad range of distortion types without prior assumptions about task or input characteristics. EDNet consists of two main components: (1) the Gating Mamba (GM) module, which adaptively combines masking and mapping through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces