Prediction of speech intelligibility with DNN-based performance measures
Angel Mario Castro Martinez, Constantin Spille, Jana Ro{\ss}bach,, Birger Kollmeier, Bernd T. Meyer

TL;DR
This paper introduces a DNN-based speech intelligibility model that predicts word error rates without needing clean speech references or word labels, showing comparable accuracy to label-based models and outperforming baselines.
Contribution
The study develops a novel DNN-based speech intelligibility model that omits the decoding step, reducing complexity and enabling real-time implementation in hearing aids.
Findings
The TDNN model matches the performance of the DNN with fewer parameters.
The proposed model predicts speech reception thresholds accurately across various noise types.
It outperforms five established models in prediction accuracy.
Abstract
This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence of words given phoneme posterior probabilities, is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
