An evaluation of intrusive instrumental intelligibility metrics
Steven Van Kuyk, W. Bastiaan Kleijn, and Richard C. Hendriks

TL;DR
This paper evaluates 12 instrumental intelligibility metrics across various distortions, identifying top performers SIIB and HASPI, and introduces a faster version of SIIB with comparable accuracy, highlighting challenges in generalization.
Contribution
It provides a comprehensive evaluation of existing intelligibility metrics, analyzes their generalization limitations, and proposes a new, computationally efficient SIIB variant.
Findings
SIIB and HASPI achieved the highest correlation with listening tests.
Intelligibility metrics perform poorly on unseen distortion types.
Modified SIIB and STOI reduce statistical dependencies, improving performance.
Abstract
Instrumental intelligibility metrics are commonly used as an alternative to listening tests. This paper evaluates 12 monaural intrusive intelligibility metrics: SII, HEGP, CSII, HASPI, NCM, QSTI, STOI, ESTOI, MIKNN, SIMI, SIIB, and . In addition, this paper investigates the ability of intelligibility metrics to generalize to new types of distortions and analyzes why the top performing metrics have high performance. The intelligibility data were obtained from 11 listening tests described in the literature. The stimuli included Dutch, Danish, and English speech that was distorted by additive noise, reverberation, competing talkers, pre-processing enhancement, and post-processing enhancement. SIIB and HASPI had the highest performance achieving a correlation with listening test scores on average of and , respectively. The high performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
