Towards Pre-training an Effective Respiratory Audio Foundation Model

Daisuke Niizumi; Daiki Takeuchi; Masahiro Yasuda; Binh Thien Nguyen; Yasunori Ohishi; Noboru Harada

arXiv:2505.15307·eess.AS·May 22, 2025

Towards Pre-training an Effective Respiratory Audio Foundation Model

Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada

PDF

Open Access 1 Repo

TL;DR

This paper investigates pre-training strategies for respiratory audio models, finding that general audio datasets like AudioSet improve performance, especially when combined with respiratory data, and introduces a new state-of-the-art for the OPERA benchmark.

Contribution

It demonstrates that pre-training on general audio datasets outperforms respiratory-specific pre-training and highlights the importance of feature aggregation methods, establishing a new benchmark.

Findings

01

Models pre-trained on AudioSet outperform respiratory-only models.

02

Combining AudioSet with respiratory datasets further improves performance.

03

Preserving frequency-wise information is crucial for effective feature aggregation.

Abstract

Recent advancements in foundation models have sparked interest in respiratory audio foundation models. However, the effectiveness of applying conventional pre-training schemes to datasets that are small-sized and lack diversity has not been sufficiently verified. This study aims to explore better pre-training practices for respiratory sounds by comparing numerous pre-trained audio models. Our investigation reveals that models pre-trained on AudioSet, a general audio dataset, are more effective than the models specifically pre-trained on respiratory sounds. Moreover, combining AudioSet and respiratory sound datasets for further pre-training enhances performance, and preserving the frequency-wise information when aggregating features is vital. Along with more insights found in the experiments, we establish a new state-of-the-art for the OPERA benchmark, contributing to advancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nttcslab/eval-audio-repr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies