Aphasic Speech Recognition using a Mixture of Speech Intelligibility Experts
Matthew Perez, Zakaria Aldeneh, Emily Mower Provost

TL;DR
This paper introduces a mixture of experts acoustic model for aphasic speech recognition, which dynamically adapts to varying speech intelligibility levels, significantly improving accuracy over standard models.
Contribution
The paper presents a novel severity-based mixture of experts model that explicitly incorporates speech intelligibility estimation for better aphasic speech recognition.
Findings
Significant reduction in phone error rates across severity stages
Effective use of speech intelligibility detector for expert weighting
Improved robustness over baseline models
Abstract
Robust speech recognition is a key prerequisite for semantic feature extraction in automatic aphasic speech analysis. However, standard one-size-fits-all automatic speech recognition models perform poorly when applied to aphasic speech. One reason for this is the wide range of speech intelligibility due to different levels of severity (i.e., higher severity lends itself to less intelligible speech). To address this, we propose a novel acoustic model based on a mixture of experts (MoE), which handles the varying intelligibility stages present in aphasic speech by explicitly defining severity-based experts. At test time, the contribution of each expert is decided by estimating speech intelligibility with a speech intelligibility detector (SID). We show that our proposed approach significantly reduces phone error rates across all severity stages in aphasic speech compared to a baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
