DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation
Ali Aroudi, Sebastian Braun

TL;DR
This paper introduces DBnet, an end-to-end deep learning framework that combines DOA estimation and beamforming for effective far-field sound source separation in noisy, reverberant environments, outperforming existing methods.
Contribution
The paper presents a novel DOA-driven beamforming network trained without ground-truth DOAs and extends it with post masking for improved separation performance.
Findings
Outperforms state-of-the-art source separation methods
Effective in reverberant and noisy environments
End-to-end training without ground-truth DOAs
Abstract
Many deep learning techniques are available to perform source separation and reduce background noise. However, designing an end-to-end multi-channel source separation method using deep learning and conventional acoustic signal processing techniques still remains challenging. In this paper we propose a direction-of-arrival-driven beamforming network (DBnet) consisting of direction-of-arrival (DOA) estimation and beamforming layers for end-to-end source separation. We propose to train DBnet using loss functions that are solely based on the distances between the separated speech signals and the target speech signals, without a need for the ground-truth DOAs of speakers. To improve the source separation performance, we also propose end-to-end extensions of DBnet which incorporate post masking networks. We evaluate the proposed DBnet and its extensions on a very challenging dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Acoustic Wave Phenomena Research
