Optimal Transport-based Adaptation in Dysarthric Speech Tasks
Rosanna Turrisi, Leonardo Badino

TL;DR
This paper introduces an optimal transport-based multi-source domain adaptation method for dysarthric speech, significantly improving detection accuracy, command recognition, and dysarthria diagnosis by leveraging speaker similarity measures.
Contribution
It proposes MSDA-WDJOT, a novel optimal transport approach for dysarthric speech adaptation, outperforming existing models in detection, recognition, and diagnosis tasks.
Findings
0.9% improvement in dysarthria detection accuracy
16% reduction in command error rate
95% accuracy in dysarthria diagnosis
Abstract
In many real-world applications, the mismatch between distributions of training data (source) and test data (target) significantly degrades the performance of machine learning algorithms. In speech data, causes of this mismatch include different acoustic environments or speaker characteristics. In this paper, we address this issue in the challenging context of dysarthric speech, by multi-source domain/speaker adaptation (MSDA/MSSA). Specifically, we propose the use of an optimal-transport based approach, called MSDA via Weighted Joint Optimal Transport (MSDA-WDJOT). We confront the mismatch problem in dysarthria detection for which the proposed approach outperforms both the Baseline and the state-of-the-art MSDA models, improving the detection accuracy of 0.9% over the best competitor method. We then employ MSDA-WJDOT for dysarthric speaker adaptation in command speech recognition. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Voice and Speech Disorders
