On The Compensation Between Magnitude and Phase in Speech Separation
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux

TL;DR
This paper investigates the interplay between magnitude and phase in speech separation, revealing how their implicit compensation affects the quality and intelligibility of separated speech, especially when optimizing neural networks without explicit magnitude loss.
Contribution
It introduces a novel perspective on the compensation between magnitude and phase, providing analytical insights supported by experiments in noisy-reverberant conditions.
Findings
Loss functions without magnitude loss can worsen speech quality and recognition.
Implicit compensation between magnitude and phase influences separation performance.
Analytical results validate the importance of considering magnitude-phase interactions.
Abstract
Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This paper provides a novel view from the perspective of the implicit compensation between estimated magnitude…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
