Audio-Visual Feature Synchronization for Robust Speech Enhancement in Hearing Aids
Nasir Saleem, Mandar Gogate, Kia Dashtipour, Adeel Hussain, Usman Anwar, Adewale Adetomi, Tughrul Arslan, and Amir Hussain

TL;DR
This paper introduces a lightweight audio-visual speech enhancement model for hearing aids that synchronizes audio and visual features to improve speech clarity in noisy environments, achieving real-time performance with low latency and energy use.
Contribution
It proposes a novel cross-attentional model that effectively aligns audio-visual features for speech enhancement, optimized for real-time hearing aid applications.
Findings
Significant improvements in perceptual quality (PESQ:0.52)
Enhanced speech intelligibility (STOI:19%)
High fidelity with SI-SDR of 10.10dB
Abstract
Audio-visual feature synchronization for real-time speech enhancement in hearing aids represents a progressive approach to improving speech intelligibility and user experience, particularly in strong noisy backgrounds. This approach integrates auditory signals with visual cues, utilizing the complementary description of these modalities to improve speech intelligibility. Audio-visual feature synchronization for real-time SE in hearing aids can be further optimized using an efficient feature alignment module. In this study, a lightweight cross-attentional model learns robust audio-visual representations by exploiting large-scale data and simple architecture. By incorporating the lightweight cross-attentional model in an AVSE framework, the neural system dynamically emphasizes critical features across audio and visual modalities, enabling defined synchronization and improved speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
