Unsupervised speech enhancement with spectral kurtosis and double deep   priors

Hien Ohnaka; Ryoichi Miyazaki

arXiv:2407.03887·cs.SD·July 8, 2024

Unsupervised speech enhancement with spectral kurtosis and double deep priors

Hien Ohnaka, Ryoichi Miyazaki

PDF

Open Access

TL;DR

This paper introduces an unsupervised speech enhancement method using dual deep priors and spectral kurtosis, effectively separating clean speech from noise without early stopping issues, outperforming traditional approaches.

Contribution

The novel approach employs two DNNs and spectral kurtosis to improve speech enhancement, addressing early stopping and noise trade-off challenges in unsupervised settings.

Findings

01

Outperforms conventional methods in white Gaussian and environmental noise scenarios

02

Effectively mitigates early stopping problems in speech enhancement

03

Demonstrates improved separation of speech and noise signals

Abstract

This paper proposes an unsupervised DNN-based speech enhancement approach founded on deep priors (DPs). Here, DP signifies that DNNs are more inclined to produce clean speech signals than noises. Conventional methods based on DP typically involve training on a noisy speech signal using a random noise feature as input, stopping training only a clean speech signal is generated. However, such conventional approaches encounter challenges in determining the optimal stop timing, experience performance degradation due to environmental background noise, and suffer a trade-off between distortion of the clean speech signal and noise reduction performance. To address these challenges, we utilize two DNNs: one to generate a clean speech signal and the other to generate noise. The combined output of these networks closely approximates the noisy speech signal, with a loss term based on spectral…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis