On the Relationship Between Short-Time Objective Intelligibility and Short-Time Spectral-Amplitude Mean-Square Error for Speech Enhancement
Morten Kolb{\ae}k, Zheng-Hua Tan, Jesper Jensen

TL;DR
This paper investigates the relationship between the traditional spectral amplitude MSE criterion and the STOI-based intelligibility measure in speech enhancement, showing their practical equivalence and near-optimality of MSE for intelligibility.
Contribution
It establishes a theoretical link between STSA-MSE and STOI, demonstrating their equivalence and the near-optimality of MSE for improving speech intelligibility.
Findings
STSA-MSE and ELC criteria are practically equivalent under certain conditions.
Standard STSA minimum-MSE estimator is near optimal for STOI-based intelligibility.
Empirical data supports the theoretical equivalence between MSE and STOI-based measures.
Abstract
The majority of deep neural network (DNN) based speech enhancement algorithms rely on the mean-square error (MSE) criterion of short-time spectral amplitudes (STSA), which has no apparent link to human perception, e.g. speech intelligibility. Short-Time Objective Intelligibility (STOI), a popular state-of-the-art speech intelligibility estimator, on the other hand, relies on linear correlation of speech temporal envelopes. This raises the question if a DNN training criterion based on envelope linear correlation (ELC) can lead to improved speech intelligibility performance of DNN based speech enhancement algorithms compared to algorithms based on the STSA-MSE criterion. In this paper we derive that, under certain general conditions, the STSA-MSE and ELC criteria are practically equivalent, and we provide empirical data to support our theoretical results. Furthermore, our experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
