# Speech and music source separation for cochlear implant users: front-end and end-to-end approach

**Authors:** Sina Tahmasebi, Waldo Nogueira

PMC · DOI: 10.3389/fnins.2025.1696669 · 2026-01-13

## TL;DR

This study compares deep learning methods to improve speech and music perception for cochlear implant users in noisy environments.

## Contribution

The study evaluates front-end and end-to-end DNN-based source separation approaches for cochlear implant users in speech and music tasks.

## Key findings

- End-to-end DNNs outperformed front-end models in speech understanding tasks.
- Front-end models scored higher in music appreciation for cochlear implant users.
- Objective metrics and listening experiments were used to assess model performance.

## Abstract

A cochlear implant (CI) is a surgically implanted neuroprosthetic device designed to restore auditory perception in individuals with profound sensorineural hearing loss. While CI users generally demonstrate good speech intelligibility in quiet listening environments, their performance significantly declines in the presence of competing sound sources. Moreover, music perception and appreciation remain limited for many CI users. These limitations are largely attributed to the inadequate representation of pitch information, which is critical for both music and speech stream segregation in complex auditory scenes. To address these challenges, source separation techniques have been increasingly employed to enhance target speech and isolate singing voices in music. Previous research has shown that CI users report greater music enjoyment when vocals are enhanced relative to the accompanying background instrumentation. Building on this, recent studies have leveraged deep neural networks (DNNs) as both front-end and end-to-end modules to improve speech intelligibility and music enjoyment for CI users. In the present study, we compare front-end and end-to-end DNN-based source separation approaches for two tasks: speech masked by competing speech and singing music. All implemented pipelines were first evaluated using objective instrumental metrics. Based on these results, the models were subsequently assessed in a listening experiment involving nine bilateral CI users. While the end-to-end pipeline outperformed the front-end pipeline in speech understanding tasks, the front-end approach yielded higher scores in music appreciation questionnaires. These findings support the hypothesis that CI sound coding strategies can be effectively combined with DNN-based source separation models. Furthermore, we hypothesize that the limited performance of end-to-end music source separation in enhancing music perception for CI users may be due to the absence of a dedicated sound coding strategy tailored for instrumental music.

## Full-text entities

- **Diseases:** sensorineural hearing loss (MESH:D006319)

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12835400/full.md

---
Source: https://tomesphere.com/paper/PMC12835400