Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui, Chen, Lan Wang, Xunying Liu, Feng Tian

TL;DR
This paper presents Perceiver-Prompt, a novel speaker adaptation method using P-Tuning and a Perceiver to enhance Chinese disordered speech recognition with Whisper, achieving up to 13.04% CER reduction.
Contribution
It introduces Perceiver-Prompt, combining P-Tuning and a Perceiver for effective speaker adaptation in disordered speech recognition.
Findings
Up to 13.04% CER reduction on Chinese dysarthric speech dataset.
Perceiver-Prompt improves recognition performance over fine-tuned Whisper.
Demonstrates effectiveness of speaker prompts in disordered speech recognition.
Abstract
Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Reforms and Innovations · Speech Recognition and Synthesis
