Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese   Disordered Speech Recognition

Yicong Jiang; Tianzi Wang; Xurong Xie; Juan Liu; Wei Sun; Nan Yan; Hui; Chen; Lan Wang; Xunying Liu; Feng Tian

arXiv:2406.09873·eess.AS·June 17, 2024

Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition

Yicong Jiang, Tianzi Wang, Xurong Xie, Juan Liu, Wei Sun, Nan Yan, Hui, Chen, Lan Wang, Xunying Liu, Feng Tian

PDF

Open Access

TL;DR

This paper presents Perceiver-Prompt, a novel speaker adaptation method using P-Tuning and a Perceiver to enhance Chinese disordered speech recognition with Whisper, achieving up to 13.04% CER reduction.

Contribution

It introduces Perceiver-Prompt, combining P-Tuning and a Perceiver for effective speaker adaptation in disordered speech recognition.

Findings

01

Up to 13.04% CER reduction on Chinese dysarthric speech dataset.

02

Perceiver-Prompt improves recognition performance over fine-tuned Whisper.

03

Demonstrates effectiveness of speaker prompts in disordered speech recognition.

Abstract

Disordered speech recognition profound implications for improving the quality of life for individuals afflicted with, for example, dysarthria. Dysarthric speech recognition encounters challenges including limited data, substantial dissimilarities between dysarthric and non-dysarthric speakers, and significant speaker variations stemming from the disorder. This paper introduces Perceiver-Prompt, a method for speaker adaptation that utilizes P-Tuning on the Whisper large-scale model. We first fine-tune Whisper using LoRA and then integrate a trainable Perceiver to generate fixed-length speaker prompts from variable-length inputs, to improve model recognition of Chinese dysarthric speech. Experimental results from our Chinese dysarthric speech dataset demonstrate consistent improvements in recognition performance with Perceiver-Prompt. Relative reduction up to 13.04% in CER is obtained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducational Reforms and Innovations · Speech Recognition and Synthesis