Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using   Whisper and Metadata

Ryandhimas E. Zezario; Fei Chen; Chiou-Shann Fuh; Hsin-Min Wang; Yu; Tsao

arXiv:2309.09548·eess.AS·June 14, 2024

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata

Ryandhimas E. Zezario, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu, Tsao

PDF

Open Access

TL;DR

This paper introduces MBI-Net+ with Whisper embeddings, speech metadata, and HASPI integration, significantly improving speech intelligibility prediction accuracy for hearing aids over previous models.

Contribution

The paper presents three novel methods and an enhanced model, MBI-Net+, that improve speech intelligibility prediction accuracy by leveraging Whisper embeddings, metadata, and HASPI metrics.

Findings

01

MBI-Net+ outperforms baseline systems on Clarity Prediction Challenge 2023 dataset.

02

Incorporating Whisper embeddings improves acoustic feature representation.

03

Using HASPI as a supplementary metric enhances prediction performance.

Abstract

Automated speech intelligibility assessment is pivotal for hearing aid (HA) development. In this paper, we present three novel methods to improve intelligibility prediction accuracy and introduce MBI-Net+, an enhanced version of MBI-Net, the top-performing system in the 1st Clarity Prediction Challenge. MBI-Net+ leverages Whisper's embeddings to create cross-domain acoustic features and includes metadata from speech signals by using a classifier that distinguishes different enhancement methods. Furthermore, MBI-Net+ integrates the hearing-aid speech perception index (HASPI) as a supplementary metric into the objective function to further boost prediction performance. Experimental results demonstrate that MBI-Net+ surpasses several intrusive baseline systems and MBI-Net on the Clarity Prediction Challenge 2023 dataset, validating the effectiveness of incorporating Whisper embeddings,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation