Towards Pre-training an Effective Respiratory Audio Foundation Model
Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda, Binh Thien Nguyen, Yasunori Ohishi, Noboru Harada

TL;DR
This paper investigates pre-training strategies for respiratory audio models, finding that general audio datasets like AudioSet improve performance, especially when combined with respiratory data, and introduces a new state-of-the-art for the OPERA benchmark.
Contribution
It demonstrates that pre-training on general audio datasets outperforms respiratory-specific pre-training and highlights the importance of feature aggregation methods, establishing a new benchmark.
Findings
Models pre-trained on AudioSet outperform respiratory-only models.
Combining AudioSet with respiratory datasets further improves performance.
Preserving frequency-wise information is crucial for effective feature aggregation.
Abstract
Recent advancements in foundation models have sparked interest in respiratory audio foundation models. However, the effectiveness of applying conventional pre-training schemes to datasets that are small-sized and lack diversity has not been sufficiently verified. This study aims to explore better pre-training practices for respiratory sounds by comparing numerous pre-trained audio models. Our investigation reveals that models pre-trained on AudioSet, a general audio dataset, are more effective than the models specifically pre-trained on respiratory sounds. Moreover, combining AudioSet and respiratory sound datasets for further pre-training enhances performance, and preserving the frequency-wise information when aggregating features is vital. Along with more insights found in the experiments, we establish a new state-of-the-art for the OPERA benchmark, contributing to advancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
