End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments

Meng-Ping Lin; Enoch Hsin-Ho Huang; Shao-Yi Chien; Yu Tsao

arXiv:2508.13576·eess.AS·January 30, 2026

End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments

Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao

PDF

TL;DR

This paper presents an end-to-end audio-visual deep learning system for cochlear implants that significantly improves speech intelligibility in noisy environments by integrating visual cues with sound processing.

Contribution

It introduces a novel AVSE-ECS system combining visual speech enhancement with electrode neural coding, demonstrating improved performance over traditional methods.

Findings

01

Achieved a 7.47 dB increase in signal-to-error ratio.

02

Enhanced speech intelligibility in noisy conditions.

03

Validated the system through simulations.

Abstract

The cochlear implant (CI) is a successful biomedical device that enables individuals with severe-to-profound hearing loss to perceive sound through electrical stimulation, yet listening in noise remains challenging. Recent deep learning advances offer promising potential for CI sound coding by integrating visual cues. In this study, an audio-visual speech enhancement (AVSE) module is integrated with the ElectrodeNet-CS (ECS) model to form the end-to-end CI system, AVSE-ECS. Simulations show that the AVSE-ECS system with joint training achieves high objective speech intelligibility and improves the signal-to-error ratio (SER) by 7.4666 dB compared to the advanced combination encoder (ACE) strategy. These findings underscore the potential of AVSE-based CI sound coding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.