No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Haoshuai Zhou; Changgeng Mo; Boxuan Cao; Linkai Li; Shan Xiang Wang

arXiv:2506.02039·eess.AS·June 4, 2025

No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Haoshuai Zhou, Changgeng Mo, Boxuan Cao, Linkai Li, Shan Xiang Wang

PDF

Open Access

TL;DR

This paper introduces SSIPNet, a deep learning model that predicts individual speech intelligibility using existing data, outperforming traditional audiogram-based methods even with limited support data.

Contribution

The paper presents a novel deep learning approach that leverages support samples to predict personalized speech intelligibility without relying on audiograms.

Findings

01

Outperforms audiogram-based predictions with few support samples

02

Effective on Clarity Prediction Challenge dataset

03

Introduces a new paradigm for personalized speech prediction

Abstract

Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis