Neural Speech Phase Prediction based on Parallel Estimation Architecture   and Anti-Wrapping Losses

Yang Ai; Zhen-Hua Ling

arXiv:2211.15974·cs.SD·February 17, 2023

Neural Speech Phase Prediction based on Parallel Estimation Architecture and Anti-Wrapping Losses

Yang Ai, Zhen-Hua Ling

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural network model for direct speech phase prediction that employs a parallel estimation architecture and anti-wrapping losses, achieving superior speech reconstruction quality and speed over existing methods.

Contribution

The paper proposes a novel neural speech phase prediction model with a parallel estimation architecture and anti-wrapping losses, effectively addressing phase wrapping issues.

Findings

01

Outperforms Griffin-Lim algorithm in speech quality

02

Faster speech reconstruction with neural network

03

Effectively handles phase wrapping errors

Abstract

This paper presents a novel speech phase prediction model which predicts wrapped phase spectra directly from amplitude spectra by neural networks. The proposed model is a cascade of a residual convolutional network and a parallel estimation architecture. The parallel estimation architecture is composed of two parallel linear convolutional layers and a phase calculation formula, imitating the process of calculating the phase spectra from the real and imaginary parts of complex spectra and strictly restricting the predicted phase values to the principal value interval. To avoid the error expansion issue caused by phase wrapping, we design anti-wrapping training losses defined between the predicted wrapped phase spectra and natural ones by activating the instantaneous phase error, group delay error and instantaneous angular frequency error using an anti-wrapping function. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangai520/nspp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Phonetics and Phonology Research