Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Yong Xu; Zhuohuang Zhang; Meng Yu; Shi-Xiong Zhang; Dong Yu

arXiv:2101.01280·cs.SD·April 6, 2021

Generalized Spatio-Temporal RNN Beamformer for Target Speech Separation

Yong Xu, Zhuohuang Zhang, Meng Yu, Shi-Xiong Zhang, Dong Yu

PDF

1 Repo

TL;DR

This paper introduces a novel spatio-temporal RNN-based beamformer for target speech separation, which learns beamforming weights directly from covariance matrices, improving speech quality and noise reduction over traditional methods.

Contribution

It proposes a new RNN-based framework for beamforming that automatically learns weights from covariance matrices, with improved variants using layer normalization.

Findings

01

GRNN-BF outperforms prior methods in PESQ, SNR, and WER.

02

Layer normalization enhances beamformer performance.

03

The approach effectively reduces residual noise in separated speech.

Abstract

Although the conventional mask-based minimum variance distortionless response (MVDR) could reduce the non-linear distortion, the residual noise level of the MVDR separated speech is still high. In this paper, we propose a spatio-temporal recurrent neural network based beamformer (RNN-BF) for target speech separation. This new beamforming framework directly learns the beamforming weights from the estimated speech and noise spatial covariance matrices. Leveraging on the temporal modeling capability of RNNs, the RNN-BF could automatically accumulate the statistics of the speech and noise covariance matrices to learn the frame-level beamforming weights in a recursive way. An RNN-based generalized eigenvalue (RNN-GEV) beamformer and a more generalized RNN beamformer (GRNN-BF) are proposed. We further improve the RNN-GEV and the GRNN-BF by using layer normalization to replace the commonly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yongxuUSTC/grnnbf
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.