Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement
Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj,, Shinji Watanabe, Zhuo Chen, John R. Hershey

TL;DR
This paper presents a sequential neural beamforming approach combining spectral separation and spatial beamforming, significantly improving speech separation and enhancement performance in reverberant conditions.
Contribution
It introduces a novel multi-frame neural beamforming method with advanced covariance modeling and contextual frames, achieving state-of-the-art results in speech separation.
Findings
Average 2.75 dB SI-SNR improvement
14.2% reduction in speech recognition error
Effective separation of real recordings in LibriCSS
Abstract
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function. For beamforming, we explore multiple ways of computing time-varying covariance matrices, including factorizing the spatial covariance into a time-varying amplitude component and a time-invariant spatial component, as well as using block-based techniques. In addition, we introduce a multi-frame beamforming method which improves the results significantly by adding contextual frames to the beamforming formulations. We extensively evaluate and analyze the effects of window size, block size, and multi-frame context size for these methods. Our best method utilizes a sequence of three neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Hearing Loss and Rehabilitation
