Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction

Xiajie Zhou; Candy Olivia Mawalim; Masashi Unoki

arXiv:2507.22599·eess.AS·July 31, 2025·WASPAA

Modeling Multi-Level Hearing Loss for Speech Intelligibility Prediction

Xiajie Zhou, Candy Olivia Mawalim, Masashi Unoki

PDF

TL;DR

This paper introduces a novel speech intelligibility prediction method that models hearing loss effects by simulating auditory degradations and analyzing spectro-temporal modulations, outperforming existing indices especially for hearing-impaired listeners.

Contribution

It presents a new auditory-inspired feature extraction and a Transformer-based regression model that explicitly accounts for hearing loss severity, improving speech intelligibility prediction accuracy.

Findings

01

Outperforms HASPI v2 in predicting speech intelligibility for hearing-impaired groups.

02

Reduces prediction error by 16.5% for mild hearing loss and 6.1% for moderate-to-severe loss.

03

Highlights the importance of modeling frequency and temporal resolution deficits.

Abstract

The diverse perceptual consequences of hearing loss severely impede speech communication, but standard clinical audiometry, which is focused on threshold-based frequency sensitivity, does not adequately capture deficits in frequency and temporal resolution. To address this limitation, we propose a speech intelligibility prediction method that explicitly simulates auditory degradations according to hearing loss severity by broadening cochlear filters and applying low-pass modulation filtering to temporal envelopes. Speech signals are subsequently analyzed using the spectro-temporal modulation (STM) representations, which reflect how auditory resolution loss alters the underlying modulation structure. In addition, normalized cross-correlation (NCC) matrices quantify the similarity between the STM representations of clean speech and speech in noise. These auditory-informed features are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.