Phase-incorporating Speech Enhancement Based on Complex-valued Gaussian   Process Latent Variable Model

Sih-Huei Chen; Yuan-Shan Lee; Jia-Ching Wang

arXiv:1612.09150·cs.SD·January 2, 2017·2 cites

Phase-incorporating Speech Enhancement Based on Complex-valued Gaussian Process Latent Variable Model

Sih-Huei Chen, Yuan-Shan Lee, Jia-Ching Wang

PDF

Open Access

TL;DR

This paper introduces a novel complex-valued Gaussian process latent variable model that enhances speech by directly modifying both magnitude and phase of the spectrum, leading to improved speech quality.

Contribution

It proposes a new method that models the entire complex spectrum of speech as a Gaussian process, enabling phase-aware enhancement beyond traditional magnitude-only techniques.

Findings

01

Outperforms baseline methods on Mandarin digit dataset

02

Effectively models complex spectrum with Gaussian processes

03

Improves speech quality by phase incorporation

Abstract

Traditional speech enhancement techniques modify the magnitude of a speech in time-frequency domain, and use the phase of a noisy speech to resynthesize a time domain speech. This work proposes a complex-valued Gaussian process latent variable model (CGPLVM) to enhance directly the complex-valued noisy spectrum, modifying not only the magnitude but also the phase. The main idea that underlies the developed method is the modeling of short-time Fourier transform (STFT) coefficients across the time frames of a speech as a proper complex Gaussian process (GP) with noise added. The proposed method is based on projecting the spectrum into a low-dimensional subspace. The likelihood criterion is used to optimize the hyperparameters of the model. Experiments were carried out on the CHTTL database, which contains the digits zero to nine in Mandarin. Several standard measures are used to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing