FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio   Processing

Yi Luo; Enea Ceolini; Cong Han; Shih-Chii Liu; Nima Mesgarani

arXiv:1909.13387·eess.AS·October 2, 2019·30 cites

FaSNet: Low-latency Adaptive Beamforming for Multi-microphone Audio Processing

Yi Luo, Enea Ceolini, Cong Han, Shih-Chii Liu, Nima Mesgarani

PDF

1 Repo

TL;DR

FaSNet is a low-latency, time-domain neural beamforming method that adaptively filters multi-microphone signals, outperforming traditional methods in noisy and reverberant environments, and reducing speech recognition errors.

Contribution

Introduces FaSNet, a novel filter-and-sum neural network for low-latency adaptive beamforming in multi-microphone audio processing.

Findings

01

Outperforms traditional oracle beamformers in SI-SNR metrics.

02

Achieves 14.3% relative WER reduction on CHiME-3 dataset.

03

Effective in reverberant and noisy conditions.

Abstract

Beamforming has been extensively investigated for multi-channel audio processing tasks. Recently, learning-based beamforming methods, sometimes called \textit{neural beamformers}, have achieved significant improvements in both signal quality (e.g. signal-to-noise ratio (SNR)) and speech recognition (e.g. word error rate (WER)). Such systems are generally non-causal and require a large context for robust estimation of inter-channel features, which is impractical in applications requiring low-latency responses. In this paper, we propose filter-and-sum network (FaSNet), a time-domain, filter-based beamforming approach suitable for low-latency scenarios. FaSNet has a two-stage system design that first learns frame-level time-domain adaptive beamforming filters for a selected reference channel, and then calculate the filters for all remaining channels. The filtered outputs at all channels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yluo42/TAC
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Advanced Adaptive Filtering Techniques