Implicit Filter-and-sum Network for Multi-channel Speech Separation

Yi Luo; Nima Mesgarani

arXiv:2011.08401·eess.AS·November 18, 2020·1 cites

Implicit Filter-and-sum Network for Multi-channel Speech Separation

Yi Luo, Nima Mesgarani

PDF

Open Access

TL;DR

This paper introduces iFaSNet, an improved version of FaSNet for multi-channel speech separation, using implicit filtering in latent space and feature-level NCC, achieving significant performance gains.

Contribution

The paper proposes a novel implicit filter-and-sum approach and feature-level NCC features to enhance FaSNet's performance in speech separation tasks.

Findings

01

iFaSNet outperforms FaSNet across all tested conditions.

02

The implicit formulation better matches end-to-end separation objectives.

03

Feature-level NCC improves model's feature representation.

Abstract

Various neural network architectures have been proposed in recent years for the task of multi-channel speech separation. Among them, the filter-and-sum network (FaSNet) performs end-to-end time-domain filter-and-sum beamforming and has shown effective in both ad-hoc and fixed microphone array geometries. In this paper, we investigate multiple ways to improve the performance of FaSNet. From the problem formulation perspective, we change the explicit time-domain filter-and-sum operation which involves all the microphones into an implicit filter-and-sum operation in the latent space of only the reference microphone. The filter-and-sum operation is applied on a context around the frame to be separated. This allows the problem formulation to better match the objective of end-to-end separation. From the feature extraction perspective, we modify the calculation of sample-level normalized cross…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis