Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently   Enhance Speech Separation in Real-World Listening Scenes

Feyisayo Olalere; Kiki van der Heijden; Christiaan H. Stronks; Jeroen; Briaire; Johan HM Frijns; Marcel van Gerven

arXiv:2501.14610·cs.SD·January 27, 2025

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes

Feyisayo Olalere, Kiki van der Heijden, Christiaan H. Stronks, Jeroen, Briaire, Johan HM Frijns, Marcel van Gerven

PDF

Open Access

TL;DR

This paper investigates how spatial cues from cochlear implant microphones can be leveraged to improve speech separation in real-world environments, addressing challenges posed by reverberation and spatial ambiguity for assistive hearing devices.

Contribution

It introduces methods to utilize both implicit and explicit spatial cues to enhance speech separation performance in cochlear implant scenarios, especially in complex acoustic environments.

Findings

01

Spatial cues improve separation for spatially separated talkers

02

Explicit spatial cues benefit when implicit cues are weak

03

Training on real-world data enhances model generalizability

Abstract

Speech separation approaches for single-channel, dry speech mixtures have significantly improved. However, real-world spatial and reverberant acoustic environments remain challenging, limiting the effectiveness of these approaches for assistive hearing devices like cochlear implants (CIs). To address this, we quantify the impact of real-world acoustic scenes on speech separation and explore how spatial cues can enhance separation quality efficiently. We analyze performance based on implicit spatial cues (inherent in the acoustic input and learned by the model) and explicit spatial cues (manually calculated spatial features added as auxiliary inputs). Our findings show that spatial cues (both implicit and explicit) improve separation for mixtures with spatially separated and nearby talkers. Furthermore, spatial cues enhance separation when spectral cues are ambiguous, such as when voices…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing