Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge
Samuele Cornell, Zhong-Qiu Wang, Yoshiki Masuyama, Shinji Watanabe,, Manuel Pariente, Nobutaka Ono

TL;DR
This paper introduces iNeuBe-X, an advanced multi-channel speech enhancement method for hearing aids, combining neural networks and beamforming with novel speaker extraction and latency constraints, achieving high perceptual quality and distortion reduction.
Contribution
The paper extends the iNeuBe framework with a new multi-channel TF-GridNet architecture, speaker-conditioning, and fine-tuning for target extraction under strict latency, advancing hearing-aid speech enhancement.
Findings
Achieved a HASPI score of 0.942 on challenging data
Attained an SI-SDRi of 18.8 dB without external data
Demonstrated effective target speaker extraction in noisy-reverberant environments
Abstract
This paper describes our submission to the Second Clarity Enhancement Challenge (CEC2), which consists of target speech enhancement for hearing-aid (HA) devices in noisy-reverberant environments with multiple interferers such as music and competing speakers. Our approach builds upon the powerful iterative neural/beamforming enhancement (iNeuBe) framework introduced in our recent work, and this paper extends it for target speaker extraction. We therefore name the proposed approach as iNeuBe-X, where the X stands for extraction. To address the challenges encountered in the CEC2 setting, we introduce four major novelties: (1) we extend the state-of-the-art TF-GridNet model, originally designed for monaural speaker separation, for multi-channel, causal speech enhancement, and large improvements are observed by replacing the TCNDenseNet used in iNeuBe with this new architecture; (2) we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
