Intelligibility prediction with a pretrained noise-robust automatic   speech recognition model

Zehai Tu; Ning Ma; Jon Barker

arXiv:2310.19817·eess.AS·November 1, 2023·1 cites

Intelligibility prediction with a pretrained noise-robust automatic speech recognition model

Zehai Tu, Ning Ma, Jon Barker

PDF

Open Access

TL;DR

This paper develops two intelligibility prediction systems based on a pretrained noise-robust ASR model, demonstrating robustness and accuracy in unseen noisy speech scenarios for the Clarity Prediction Challenge.

Contribution

It introduces both intrusive and non-intrusive intelligibility prediction systems derived from a pretrained noise-robust ASR model, without fine-tuning on CPC2 data.

Findings

01

High prediction accuracy on CPC2 evaluation

02

Robustness to unseen noisy speech scenarios

03

Effective use of hidden representations and uncertainty measures

Abstract

This paper describes two intelligibility prediction systems derived from a pretrained noise-robust automatic speech recognition (ASR) model for the second Clarity Prediction Challenge (CPC2). One system is intrusive and leverages the hidden representations of the ASR model. The other system is non-intrusive and makes predictions with derived ASR uncertainty. The ASR model is only pretrained with a simulated noisy speech corpus and does not take advantage of the CPC2 data. For that reason, the intelligibility prediction systems are robust to unseen scenarios given the accurate prediction performance on the CPC2 evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems