Multi-Channel Target Speaker Extraction with Refinement: The WavLab   Submission to the Second Clarity Enhancement Challenge

Samuele Cornell; Zhong-Qiu Wang; Yoshiki Masuyama; Shinji Watanabe,; Manuel Pariente; Nobutaka Ono

arXiv:2302.07928·eess.AS·February 17, 2023·5 cites

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge

Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe,, Manuel Pariente, Nobutaka Ono

PDF

Open Access

TL;DR

This paper introduces iNeuBe-X, an advanced multi-channel speech enhancement method for hearing aids, combining neural networks and beamforming with novel speaker extraction and latency constraints, achieving high perceptual quality and distortion reduction.

Contribution

The paper extends the iNeuBe framework with a new multi-channel TF-GridNet architecture, speaker-conditioning, and fine-tuning for target extraction under strict latency, advancing hearing-aid speech enhancement.

Findings

01

Achieved a HASPI score of 0.942 on challenging data

02

Attained an SI-SDRi of 18.8 dB without external data

03

Demonstrated effective target speaker extraction in noisy-reverberant environments

Abstract

This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this paper extends it for target speaker extraction. We therefore name the proposed approach as iNeuBe-X, where the X stands for extraction. To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation