Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization
Disong Wang, Liqun Deng, Yu Ting Yeung, Xiao Chen, Xunying Liu, Helen, Meng

TL;DR
This paper introduces an unsupervised domain adaptation approach for dysarthric speech detection, combining domain adversarial training and mutual information minimization to improve cross-domain performance without labeled target data.
Contribution
It formulates dysarthric speech detection as an unsupervised domain adaptation problem and proposes a multi-task learning strategy to learn domain-invariant, dysarthria-discriminative features.
Findings
Achieved 22.2% increase in utterance-level recall.
Achieved 20.0% increase in speaker-level accuracy.
Demonstrated effectiveness on UASPEECH and TORGO datasets.
Abstract
Dysarthric speech detection (DSD) systems aim to detect characteristics of the neuromotor disorder from speech. Such systems are particularly susceptible to domain mismatch where the training and testing data come from the source and target domains respectively, but the two domains may differ in terms of speech stimuli, disease etiology, etc. It is hard to acquire labelled data in the target domain, due to high costs of annotating sizeable datasets. This paper makes a first attempt to formulate cross-domain DSD as an unsupervised domain adaptation (UDA) problem. We use labelled source-domain data and unlabelled target-domain data, and propose a multi-task learning strategy, including dysarthria presence classification (DPC), domain adversarial training (DAT) and mutual information minimization (MIM), which aim to learn dysarthria-discriminative and domain-invariant biomarker embeddings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Music and Audio Processing
