An Explicit Consistency-Preserving Loss Function for Phase   Reconstruction and Speech Enhancement

Pin-Jui Ku; Chun-Wei Ho; Hao Yen; Sabato Marco Siniscalchi; and; Chin-Hui Lee

arXiv:2409.16282·eess.AS·September 25, 2024

An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement

Pin-Jui Ku, Chun-Wei Ho, Hao Yen, Sabato Marco Siniscalchi, and, Chin-Hui Lee

PDF

Open Access

TL;DR

This paper introduces a novel loss function for phase reconstruction and speech enhancement that enforces consistency between magnitude and phase, avoiding direct phase estimation and improving performance.

Contribution

The paper proposes a new consistency-preserving loss function that directly generates a consistent magnitude-phase pair, enhancing phase reconstruction and speech enhancement methods.

Findings

01

Viability demonstrated on phase reconstruction task

02

Competitive results on VB-DMD dataset

03

Favorable comparison on WSJ0-CHiME3 dataset

Abstract

In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phase using a deep model, our idea is to exploit ad-hoc constraints to directly generate a consistent pair of magnitude and phase. Specifically, the proposed loss forces a set of complex numbers to be a consistent short-time Fourier transform (STFT) representation, i.e., to be the spectrogram of a real signal. Our approach thus avoids the difficulty of estimating the original phase, which is highly unstructured and sensitive to time shift. The influence of our proposed loss is first assessed on a PR task, experimentally demonstrating that our approach is viable. Next, we show its effectiveness on an SE task, using both the VB-DMD and WSJ0-CHiME3…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Antenna Design and Optimization

MethodsSparse Evolutionary Training