Multi-Channel Masking with Learnable Filterbank for Sound Source   Separation

Wang Dai; Archontis Politis; Tuomas Virtanen

arXiv:2303.07816·eess.AS·March 15, 2023·1 cites

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

Wang Dai, Archontis Politis, Tuomas Virtanen

PDF

Open Access

TL;DR

This paper introduces a learnable filterbank for multi-channel sound source separation, estimating masks for each microphone channel to improve separation performance over traditional methods.

Contribution

It proposes a novel multi-channel masking framework using a learnable 1D Conv filterbank, enhancing separation by applying channel-specific masks in a learned feature domain.

Findings

01

Outperforms single-channel masking with learnable filterbank

02

Can surpass multi-channel complex masking with STFT in certain models

03

Demonstrates spatial selectivity in the learned filterbank domain

Abstract

This work proposes a learnable filterbank based on a multi-channel masking framework for multi-channel source separation. The learnable filterbank is a 1D Conv layer, which transforms the raw waveform into a 2D representation. In contrast to the conventional single-channel masking method, we estimate a mask for each individual microphone channel. The estimated masks are then applied to the transformed waveform representation like in the traditional filter-and-sum beamforming operation. Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed. At last, a 1D transposed Conv layer is used to convert the summed masked signal into the waveform domain. The experimental results show our method outperforms single-channel masking with a learnable filterbank and can outperform multi-channel complex masking…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Acoustic Wave Phenomena Research · Music and Audio Processing