Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
Huimeng Wang, Xurong Xie, Mengzhe Geng, Shujie Hu, Haoning Xu, Youjun, Chen, Zhaoqing Li, Jiajun Deng, Xunying Liu

TL;DR
This paper introduces phone-purity guided discrete tokens for dysarthric speech recognition, enhancing phonetic discrimination and significantly reducing word error rates compared to traditional token extraction methods.
Contribution
It proposes a novel phone-purity guided approach that regularizes discrete token extraction, improving recognition accuracy for disordered speech.
Findings
PPG discrete tokens outperform non-PPG tokens in WER reduction
Significant WER improvements up to 1.77% absolute and 4.82% relative
Sharper decision boundaries in token clustering demonstrated by visualization
Abstract
Discrete tokens extracted provide efficient and domain adaptable speech features. Their application to disordered speech that exhibits articulation imprecision and large mismatch against normal voice remains unexplored. To improve their phonetic discrimination that is weakened during unsupervised K-means or vector quantization of continuous features, this paper proposes novel phone-purity guided (PPG) discrete tokens for dysarthric speech recognition. Phonetic label supervision is used to regularize maximum likelihood and reconstruction error costs used in standard K-means and VAE-VQ based discrete token extraction. Experiments conducted on the UASpeech corpus suggest that the proposed PPG discrete token features extracted from HuBERT consistently outperform hybrid TDNN and End-to-End (E2E) Conformer systems using non-PPG based K-means or VAE-VQ tokens across varying codebook sizes by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing
