Interpretable Dysarthric Speaker Adaptation based on Optimal-Transport
Rosanna Turrisi, Leonardo Badino

TL;DR
This paper introduces an interpretable unsupervised multi-source domain adaptation method using optimal transport for dysarthric speech recognition, improving command error rates and enabling dysarthria diagnosis without extra training.
Contribution
It proposes a novel MSDA algorithm based on weighted joint optimal transport that is interpretable and capable of diagnosing dysarthria directly from speech data.
Findings
Achieved 16% reduction in command error rate over baseline
Attained 95% accuracy in dysarthria diagnosis
Provided a measure of similarity between speakers for diagnosis
Abstract
This work addresses the mismatch problem between the distribution of training data (source) and testing data (target), in the challenging context of dysarthric speech recognition. We focus on Speaker Adaptation (SA) in command speech recognition, where data from multiple sources (i.e., multiple speakers) are available. Specifically, we propose an unsupervised Multi-Source Domain Adaptation (MSDA) algorithm based on optimal-transport, called MSDA via Weighted Joint Optimal Transport (MSDA-WJDOT). We achieve a Command Error Rate relative reduction of 16% and 7% over the speaker-independent model and the best competitor method, respectively. The strength of the proposed approach is that, differently from any other existing SA method, it offers an interpretable model that can also be exploited, in this context, to diagnose dysarthria without any specific training. Indeed, it provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
