TL;DR
FaSNet is a low-latency, time-domain neural beamforming method that adaptively filters multi-microphone signals, outperforming traditional methods in noisy and reverberant environments, and reducing speech recognition errors.
Contribution
Introduces FaSNet, a novel filter-and-sum neural network for low-latency adaptive beamforming in multi-microphone audio processing.
Findings
Outperforms traditional oracle beamformers in SI-SNR metrics.
Achieves 14.3% relative WER reduction on CHiME-3 dataset.
Effective in reverberant and noisy conditions.
Abstract
Beamforming has been extensively investigated for multi-channel audio processing tasks. Recently, learning-based beamforming methods, sometimes called \textit{neural beamformers}, have achieved significant improvements in both signal quality (e.g. signal-to-noise ratio (SNR)) and speech recognition (e.g. word error rate (WER)). Such systems are generally non-causal and require a large context for robust estimation of inter-channel features, which is impractical in applications requiring low-latency responses. In this paper, we propose filter-and-sum network (FaSNet), a time-domain, filter-based beamforming approach suitable for low-latency scenarios. FaSNet has a two-stage system design that first learns frame-level time-domain adaptive beamforming filters for a selected reference channel, and then calculate the filters for all remaining channels. The filtered outputs at all channels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques
