MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware   Beamforming Network for Speech Separation

Yanjie Fu; Haoran Yin; Meng Ge; Longbiao Wang; Gaoyan Zhang; Jianwu; Dang; Chengyun Deng; Fei Wang

arXiv:2212.03401·eess.AS·December 8, 2022·1 cites

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu, Dang, Chengyun Deng, Fei Wang

PDF

Open Access

TL;DR

This paper introduces MIMO-DBnet, an end-to-end neural network for speech separation that uses multi-channel input and multiple outputs to estimate directions and beamforming weights, improving spatial discrimination without extra cues.

Contribution

The novel MIMO-DBnet architecture enables direction-guided speech separation using only mixture signals, effectively handling phase wrapping issues and enhancing separation accuracy.

Findings

01

Achieves significant improvement over baseline systems.

02

Maintains high-frequency performance despite phase wrapping.

03

Provides effective spatial discrimination guidance.

Abstract

Recently, many deep learning based beamformers have been proposed for multi-channel speech separation. Nevertheless, most of them rely on extra cues known in advance, such as speaker feature, face image or directional information. In this paper, we propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal, namely MIMO-DBnet. Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source. The precisely estimated directional embedding provides quite effective spatial discrimination guidance for the neural beamformer to offset the effect of phase wrapping, thus allowing more accurate reconstruction of two sources' speech signals. Experiments show that our proposed MIMO-DBnet not only achieves a comprehensive decent improvement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques