Multichannel Speech Separation and Enhancement Using the Convolutive   Transfer Function

Xiaofei Li; Laurent Girin; Sharon Gannot; Radu Horaud

arXiv:1711.07911·cs.SD·January 31, 2019

Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function

Xiaofei Li, Laurent Girin, Sharon Gannot, Radu Horaud

PDF

Open Access

TL;DR

This paper introduces three novel multichannel speech separation and enhancement methods based on the convolutive transfer function approximation, demonstrating improved performance and reduced complexity in noisy, convolutive environments.

Contribution

The paper proposes three new methods for multichannel speech separation using the CTF domain, including inverse filtering, beamforming-like filtering, and a sparse recovery approach with Lasso, addressing computational efficiency and unknown source conditions.

Findings

01

The methods outperform baseline techniques in various acoustic scenarios.

02

The CTF domain reduces computational complexity compared to time-domain filters.

03

The Lasso-based method effectively exploits spectral sparsity for source recovery.

Abstract

This paper addresses the problem of speech separation and enhancement from multichannel convolutive and noisy mixtures, \emph{assuming known mixing filters}. We propose to perform the speech separation and enhancement task in the short-time Fourier transform domain, using the convolutive transfer function (CTF) approximation. Compared to time-domain filters, CTF has much less taps, consequently it has less near-common zeros among channels and less computational complexity. The work proposes three speech-source recovery methods, namely: i) the multichannel inverse filtering method, i.e. the multiple input/output inverse theorem (MINT), is exploited in the CTF domain, and for the multi-source case, ii) a beamforming-like multichannel inverse filtering method applying single source MINT and using power minimization, which is suitable whenever the source CTFs are not all known, and iii) a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Adaptive Filtering Techniques · Blind Source Separation Techniques