Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech
Niclas Pokel, Pehu\'en Moure, Roman Boehringer, Yingqiang Gao

TL;DR
This paper introduces a lightweight personalization pipeline for speech recognition models that enhances transcription accuracy for impaired speech by semantic re-chaining, addressing data scarcity and non-normative speech challenges.
Contribution
It presents a novel semantic re-chaining approach to personalize foundation ASR models for impaired speech, improving performance with minimal data.
Findings
Improved transcription accuracy on impaired speech data.
Effective semantic enrichment enhances model robustness.
Potential to reduce communication barriers for impaired speakers.
Abstract
Speech impairments caused by conditions such as cerebral palsy or genetic disorders pose significant challenges for automatic speech recognition (ASR) systems. Despite recent advances, ASR models like Whisper struggle with non-normative speech due to limited training data and the difficulty of collecting and annotating non-normative speech samples. In this work, we propose a practical and lightweight pipeline to personalize ASR models, formalizing the selection of words and enriching a small, speech-impaired dataset with semantic coherence. Applied to data from a child with a structural speech impairment, our approach shows promising improvements in transcription quality, demonstrating the potential to reduce communication barriers for individuals with atypical speech patterns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Language Development and Disorders
