Toward Real-World Voice Disorder Classification
Heng-Cheng Kuo, Yu-Peng Hsieh, Huan-Hsin Tseng, Chi-Te Wang, Shih-Hau, Fang, and Yu Tsao

TL;DR
This paper presents a compact, resource-efficient voice disorder classification system that employs domain adversarial training to improve robustness in noisy real-world environments, achieving high accuracy with minimal resource use.
Contribution
It introduces a novel system combining factorized CNNs and domain adversarial training to address domain mismatch and resource constraints in voice disorder classification.
Findings
13% improvement in unweighted average recall in noisy environments
80% recall maintained in clinical domain with slight degradation
Reduced memory and computation by over 73.9%
Abstract
Objective: Voice disorders significantly compromise individuals' ability to speak in their daily lives. Without early diagnosis and treatment, these disorders may deteriorate drastically. Thus, automatic classification systems at home are desirable for people who are inaccessible to clinical disease assessments. However, the performance of such systems may be weakened due to the constrained resources and domain mismatch between the clinical data and noisy real-world data. Methods: This study develops a compact and domain-robust voice disorder classification system to identify the utterances of health, neoplasm, and benign structural diseases. Our proposed system utilizes a feature extractor model composed of factorized convolutional neural networks and subsequently deploys domain adversarial training to reconcile the domain mismatch by extracting domain invariant features. Results: The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis
