Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification
Weixin Liu, Bowen Qu, Matthew Pontell, Maria Powell, Bradley Malin, Zhijun Yin

TL;DR
This paper improves domain-adaptive self-supervised learning for clinical voice analysis by optimizing loss functions, normalization, and masking strategies, leading to better disease classification performance on a multi-institutional dataset.
Contribution
It systematically investigates and identifies optimal configurations of SSL components for health-related audio, achieving superior results over standard methods.
Findings
Optimized SSL configuration improves Macro F1 score from 0.663 to 0.688.
Content-aware masking enhances performance by emphasizing informative regions.
MA-Error loss increases robustness in clinical voice classification.
Abstract
The human voice is a promising non-invasive digital biomarker, yet deep learning for voice-based health analysis is hindered by data scarcity and domain mismatch, where models pre-trained on general audio fail to capture the subtle pathological features characteristic of clinical voice data. To address these challenges, we investigate domain-adaptive self-supervised learning (SSL) with Masked Autoencoders (MAE) and demonstrate that standard configurations are suboptimal for health-related audio. Using the Bridge2AI-Voice dataset, a multi-institutional collection of pathological voices, we systematically examine three performance-critical factors: reconstruction loss (Mean Absolute Error vs. Mean Squared Error), normalization (patch-wise vs. global), and masking (random vs. content-aware). Our optimized design, which combines Mean Absolute Error (MA-Error) loss, patch-wise normalization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Phonocardiography and Auscultation Techniques · AI in cancer detection
