Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification

Weixin Liu; Bowen Qu; Matthew Pontell; Maria Powell; Bradley Malin; Zhijun Yin

arXiv:2601.22319·eess.AS·February 2, 2026

Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification

Weixin Liu, Bowen Qu, Matthew Pontell, Maria Powell, Bradley Malin, Zhijun Yin

PDF

Open Access

TL;DR

This paper improves domain-adaptive self-supervised learning for clinical voice analysis by optimizing loss functions, normalization, and masking strategies, leading to better disease classification performance on a multi-institutional dataset.

Contribution

It systematically investigates and identifies optimal configurations of SSL components for health-related audio, achieving superior results over standard methods.

Findings

01

Optimized SSL configuration improves Macro F1 score from 0.663 to 0.688.

02

Content-aware masking enhances performance by emphasizing informative regions.

03

MA-Error loss increases robustness in clinical voice classification.

Abstract

The human voice is a promising non-invasive digital biomarker, yet deep learning for voice-based health analysis is hindered by data scarcity and domain mismatch, where models pre-trained on general audio fail to capture the subtle pathological features characteristic of clinical voice data. To address these challenges, we investigate domain-adaptive self-supervised learning (SSL) with Masked Autoencoders (MAE) and demonstrate that standard configurations are suboptimal for health-related audio. Using the Bridge2AI-Voice dataset, a multi-institutional collection of pathological voices, we systematically examine three performance-critical factors: reconstruction loss (Mean Absolute Error vs. Mean Squared Error), normalization (patch-wise vs. global), and masking (random vs. content-aware). Our optimized design, which combines Mean Absolute Error (MA-Error) loss, patch-wise normalization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Phonocardiography and Auscultation Techniques · AI in cancer detection