Non-Intrusive Binaural Speech Intelligibility Prediction Using Mamba for Hearing-Impaired Listeners
Katsuhiko Yamamoto, Koichi Miyazaki

TL;DR
This paper introduces a Mamba-based binaural speech intelligibility prediction model that offers a computationally efficient alternative to transformer-based models, maintaining high accuracy for hearing-impaired listeners.
Contribution
It proposes replacing transformer self-attention with Mamba blocks in SIP models to reduce complexity while preserving performance.
Findings
Mamba-based SIP models achieve competitive accuracy.
The proposed model has fewer parameters.
Bidirectional Mamba captures contextual and spatial info effectively.
Abstract
Speech intelligibility prediction (SIP) models have been used as objective metrics to assess intelligibility for hearing-impaired (HI) listeners. In the Clarity Prediction Challenge 2 (CPC2), non-intrusive binaural SIP models based on transformers showed high prediction accuracy. However, the self-attention mechanism theoretically incurs high computational and memory costs, making it a bottleneck for low-latency, power-efficient devices. This may also degrade the temporal processing of binaural SIPs. Therefore, we propose Mamba-based SIP models instead of transformers for the temporal processing blocks. Experimental results show that our proposed SIP model achieves competitive performance compared to the baseline while maintaining a relatively small number of parameters. Our analysis suggests that the SIP model based on bidirectional Mamba effectively captures contextual and spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Voice and Speech Disorders
