Enhancing Dysarthric Speech Recognition for Unseen Speakers via   Prototype-Based Adaptation

Shiyao Wang; Shiwan Zhao; Jiaming Zhou; Aobo Kong; Yong Qin

arXiv:2407.18461·cs.SD·September 25, 2024

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin

PDF

Open Access 1 Repo

TL;DR

This paper proposes a prototype-based method using HuBERT features and contrastive learning to improve dysarthric speech recognition for unseen speakers without fine-tuning, enhancing personalization and performance.

Contribution

Introduces a novel prototype-based adaptation approach that leverages HuBERT features and contrastive learning to improve recognition accuracy for new dysarthric speakers without additional model fine-tuning.

Findings

01

Significant performance gains on unseen speakers

02

Effective personalization without fine-tuning

03

Prototypes capture speaker-specific speech characteristics

Abstract

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substantial data collection. To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning. Our method employs a feature extractor trained with HuBERT to produce per-word prototypes that encapsulate the characteristics of previously unseen speakers. These prototypes serve as the basis for classification. Additionally, we incorporate supervised contrastive learning to refine feature extraction. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nku-hlt/pb-dsr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research

MethodsContrastive Learning