Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial   Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

Sota Misawa; Norihiro Takamune; Tomohiko Nakamura; Daichi Kitamura,; Hiroshi Saruwatari; Masakazu Une; Shoji Makino

arXiv:2109.04658·cs.SD·September 13, 2021

Speech Enhancement by Noise Self-Supervised Rank-Constrained Spatial Covariance Matrix Estimation via Independent Deeply Learned Matrix Analysis

Sota Misawa, Norihiro Takamune, Tomohiko Nakamura, Daichi Kitamura,, Hiroshi Saruwatari, Masakazu Une, Shoji Makino

PDF

Open Access

TL;DR

This paper introduces a noise self-supervised rank-constrained spatial covariance matrix estimation method using deep neural networks, improving speech enhancement by better separating target speech from diffuse noise.

Contribution

It proposes a supervised extension of RCSCME with deep neural networks and a noise self-supervised approach that enhances separation performance in noisy environments.

Findings

01

Outperforms conventional RCSCME methods under various noise conditions.

02

Utilizes deep neural networks for improved target and noise separation.

03

Introduces noise self-supervision for better covariance matrix estimation.

Abstract

Rank-constrained spatial covariance matrix estimation (RCSCME) is a method for the situation that the directional target speech and the diffuse noise are mixed. In conventional RCSCME, independent low-rank matrix analysis (ILRMA) is used as the preprocessing method. We propose RCSCME using independent deeply learned matrix analysis (IDLMA), which is a supervised extension of ILRMA. In this method, IDLMA requires deep neural networks (DNNs) to separate the target speech and the noise. We use Denoiser, which is a single-channel speech enhancement DNN, in IDLMA to estimate not only the target speech but also the noise. We also propose noise self-supervised RCSCME, in which we estimate the noise-only time intervals using the output of Denoiser and design the prior distribution of the noise spatial covariance matrix for RCSCME. We confirm that the proposed methods outperform the conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques