Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

Chengzhong Wang; Andong Li; Dingding Yao; Junfeng Li

arXiv:2602.08556·cs.SD·May 18, 2026

Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

Chengzhong Wang, Andong Li, Dingding Yao, Junfeng Li

PDF

2 Repos

TL;DR

This paper introduces a novel deep learning framework that models speech phase with global rotation equivariance, improving various speech enhancement tasks by respecting the circular nature of phase.

Contribution

It proposes a magnitude-phase dual-stream architecture with GRE-preserving modules, advancing phase modeling in speech enhancement.

Findings

01

Reduces Phase Distance by over 20% in phase retrieval.

02

Improves PESQ by more than 0.1 in zero-shot denoising.

03

Demonstrates superiority across multiple speech enhancement tasks.

Abstract

While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual Feed-Forward Network (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation