Making deep neural networks work for medical audio: representation, compression and domain adaptation

Charles C Onu

arXiv:2506.13970·cs.SD·June 18, 2025

Making deep neural networks work for medical audio: representation, compression and domain adaptation

Charles C Onu

PDF

Open Access

TL;DR

This thesis advances automated analysis of medical audio, especially infant cry sounds, by leveraging transfer learning, model compression, domain adaptation, and releasing a new dataset to improve healthcare accessibility.

Contribution

It introduces novel transfer learning, model compression, and domain adaptation techniques specifically tailored for medical audio analysis, along with a new open-source infant cry dataset.

Findings

01

Transfer learning improves infant cry classification accuracy.

02

Tensor decomposition achieves several hundred-fold model compression.

03

Domain adaptation enhances model generalization across datasets.

Abstract

This thesis addresses the technical challenges of applying machine learning to understand and interpret medical audio signals. The sounds of our lungs, heart, and voice convey vital information about our health. Yet, in contemporary medicine, these sounds are primarily analyzed through auditory interpretation by experts using devices like stethoscopes. Automated analysis offers the potential to standardize the processing of medical sounds, enable screening in low-resource settings where physicians are scarce, and detect subtle patterns that may elude human perception, thereby facilitating early diagnosis and treatment. Focusing on the analysis of infant cry sounds to predict medical conditions, this thesis contributes on four key fronts. First, in low-data settings, we demonstrate that large databases of adult speech can be harnessed through neural transfer learning to develop more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonocardiography and Auscultation Techniques · Music and Audio Processing