TL;DR
This paper proposes a versatile GEV beamforming method supported by DOA-based masks generated from microphone pairs, improving speech separation in various array geometries without retraining for each configuration.
Contribution
The paper introduces a novel approach that trains a neural network on microphone pairs to generate masks applicable to arbitrary array shapes, enhancing flexibility in speech separation.
Findings
Improved SDR from 4.78 dB to 7.69 dB across different array geometries.
Effective in various hardware configurations without retraining.
Enhances target speech quality in distant speech processing.
Abstract
Distant speech processing is a challenging task, especially when dealing with the cocktail party effect. Sound source separation is thus often required as a preprocessing step prior to speech recognition to improve the signal to distortion ratio (SDR). Recently, a combination of beamforming and speech separation networks have been proposed to improve the target source quality in the direction of arrival of interest. However, with this type of approach, the neural network needs to be trained in advance for a specific microphone array geometry, which limits versatility when adding/removing microphones, or changing the shape of the array. The solution presented in this paper is to train a neural network on pairs of microphones with different spacing and acoustic environmental conditions, and then use this network to estimate a time-frequency mask from all the pairs of microphones forming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
