Learning Filterbanks for End-to-End Acoustic Beamforming

Samuele Cornell; Manuel Pariente; Fran\c{c}ois Grondin; Stefano; Squartini

arXiv:2111.04614·eess.AS·February 22, 2022

Learning Filterbanks for End-to-End Acoustic Beamforming

Samuele Cornell, Manuel Pariente, Fran\c{c}ois Grondin, Stefano, Squartini

PDF

Open Access

TL;DR

This paper introduces a fully end-to-end neural beamforming approach that learns filterbanks jointly with the DNN, bridging the gap between short-window monaural separation and long-window conventional beamforming, leading to improved performance.

Contribution

It proposes a novel method of jointly learning analysis and synthesis filterbanks with neural beamforming, outperforming traditional oracle-mask based methods with short windows.

Findings

01

Learned filterbanks can surpass oracle-mask based beamforming for short windows.

02

Jointly trained filterbanks improve end-to-end acoustic beamforming performance.

03

The approach bridges the gap between monaural separation and traditional beamforming techniques.

Abstract

Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows. On the other hand it is widely known that, for conventional beamforming techniques, performance increases with long analysis windows. This applies also to most hybrid neural beamforming methods which rely on a deep neural network (DNN) to estimate the spatial covariance matrices. In this work we try to bridge the gap between these two worlds and explore fully end-to-end hybrid neural beamforming in which, instead of using the Short-Time-Fourier Transform, also the analysis and synthesis filterbanks are learnt jointly with the DNN. In detail, we explore two different types of learned filterbanks: fully learned and analytic. We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks it is possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis