End-to-end audio-visual learning for cochlear implant sound coding simulations in noisy environments
Meng-Ping Lin, Enoch Hsin-Ho Huang, Shao-Yi Chien, Yu Tsao

TL;DR
This paper presents an end-to-end audio-visual deep learning system for cochlear implants that significantly improves speech intelligibility in noisy environments by integrating visual cues with sound processing.
Contribution
It introduces a novel AVSE-ECS system combining visual speech enhancement with electrode neural coding, demonstrating improved performance over traditional methods.
Findings
Achieved a 7.47 dB increase in signal-to-error ratio.
Enhanced speech intelligibility in noisy conditions.
Validated the system through simulations.
Abstract
The cochlear implant (CI) is a successful biomedical device that enables individuals with severe-to-profound hearing loss to perceive sound through electrical stimulation, yet listening in noise remains challenging. Recent deep learning advances offer promising potential for CI sound coding by integrating visual cues. In this study, an audio-visual speech enhancement (AVSE) module is integrated with the ElectrodeNet-CS (ECS) model to form the end-to-end CI system, AVSE-ECS. Simulations show that the AVSE-ECS system with joint training achieves high objective speech intelligibility and improves the signal-to-error ratio (SER) by 7.4666 dB compared to the advanced combination encoder (ACE) strategy. These findings underscore the potential of AVSE-based CI sound coding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
