Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People

Haoshuai Zhou; Boxuan Cao; Changgeng Mo; Linkai Li; Shan Xiang Wang

arXiv:2505.08215·cs.AI·May 14, 2025

Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People

Haoshuai Zhou, Boxuan Cao, Changgeng Mo, Linkai Li, Shan Xiang Wang

PDF

Open Access

TL;DR

This study systematically investigates how to optimize speech foundation models for predicting speech intelligibility in hearing-impaired individuals, revealing key design choices that enhance performance.

Contribution

It provides the first comprehensive analysis of design factors affecting SFM-based speech intelligibility prediction for hearing-impaired people, including layer selection, temporal modeling, and ensembling.

Findings

01

Single encoder layer selection outperforms all-layers approach.

02

Temporal modeling significantly improves prediction accuracy.

03

Ensembling multiple SFMs enhances performance, especially with stronger models.

Abstract

Speech foundation models (SFMs) have demonstrated strong performance across a variety of downstream tasks, including speech intelligibility prediction for hearing-impaired people (SIP-HI). However, optimizing SFMs for SIP-HI has been insufficiently explored. In this paper, we conduct a comprehensive study to identify key design factors affecting SIP-HI performance with 5 SFMs, focusing on encoder layer selection, prediction head architecture, and ensemble configurations. Our findings show that, contrary to traditional use-all-layers methods, selecting a single encoder layer yields better results. Additionally, temporal modeling is crucial for effective prediction heads. We also demonstrate that ensembling multiple SFMs improves performance, with stronger individual models providing greater benefit. Finally, we explore the relationship between key SFM attributes and their impact on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Loss and Rehabilitation · Speech and Audio Processing · Voice and Speech Disorders