Exploring Optimal DNN Architecture for End-to-End Beamformers Based on Time-frequency References
Yuichiro Koyama, Bhiksha Raj

TL;DR
This paper introduces the W-Net beamformer, a novel neural network architecture that combines two existing beamforming approaches to improve audio signal enhancement across diverse acoustic environments.
Contribution
The paper presents a new combined framework that leverages the strengths of generalized eigenvalue and filter-estimation methods for robust beamforming.
Findings
Outperforms existing methods across various room and noise conditions
Effective in static and mobile noise scenarios
Demonstrates superior evaluation metrics in diverse acoustic environments
Abstract
Acoustic beamformers have been widely used to enhance audio signals. Currently, the best methods are the deep neural network (DNN)-powered variants of the generalized eigenvalue and minimum-variance distortionless response beamformers and the DNN-based filter-estimation methods that are used to directly compute beamforming filters. Both approaches are effective; however, they have blind spots in their generalizability. Therefore, we propose a novel approach for combining these two methods into a single framework that attempts to exploit the best features of both. The resulting model, called the W-Net beamformer, includes two components; the first computes time-frequency references that the second uses to estimate beamforming filters. The results on data that include a wide variety of room and noise conditions, including static and mobile noise sources, show that the proposed beamformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing
