GEV Beamforming Supported by DOA-based Masks Generated on Pairs of   Microphones

Francois Grondin; Jean-Samuel Lauzon; Jonathan Vincent; Francois; Michaud

arXiv:2005.09587·eess.AS·August 6, 2020·Interspeech

GEV Beamforming Supported by DOA-based Masks Generated on Pairs of Microphones

Francois Grondin, Jean-Samuel Lauzon, Jonathan Vincent, Francois, Michaud

PDF

1 Repo

TL;DR

This paper proposes a versatile GEV beamforming method supported by DOA-based masks generated from microphone pairs, improving speech separation in various array geometries without retraining for each configuration.

Contribution

The paper introduces a novel approach that trains a neural network on microphone pairs to generate masks applicable to arbitrary array shapes, enhancing flexibility in speech separation.

Findings

01

Improved SDR from 4.78 dB to 7.69 dB across different array geometries.

02

Effective in various hardware configurations without retraining.

03

Enhances target speech quality in distant speech processing.

Abstract

Distant speech processing is a challenging task, especially when dealing with the cocktail party effect. Sound source separation is thus often required as a preprocessing step prior to speech recognition to improve the signal to distortion ratio (SDR). Recently, a combination of beamforming and speech separation networks have been proposed to improve the target source quality in the direction of arrival of interest. However, with this type of approach, the neural network needs to be trained in advance for a specific microphone array geometry, which limits versatility when adding/removing microphones, or changing the shape of the array. The solution presented in this paper is to train a neural network on pairs of microphones with different spacing and acoustic environmental conditions, and then use this network to estimate a time-frequency mask from all the pairs of microphones forming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

francoisgrondin/steernet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.