BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
June-Woo Kim, Miika Toikkanen, Yera Choi, Seoung-Eun Moon, Ho-Young, Jung

TL;DR
This paper presents a multimodal model that combines text metadata and audio data to improve respiratory sound classification, achieving state-of-the-art results and demonstrating robustness when metadata is incomplete.
Contribution
Introduces a novel text-audio multimodal approach for respiratory sound classification that leverages metadata to enhance accuracy and robustness.
Findings
Achieves 1.17% improvement over previous best on ICBHI dataset.
Effective utilization of metadata improves classification performance.
Model remains robust with partial metadata availability.
Abstract
Respiratory sound classification (RSC) is challenging due to varied acoustic signatures, primarily influenced by patient demographics and recording environments. To address this issue, we introduce a text-audio multimodal model that utilizes metadata of respiratory sounds, which provides useful complementary information for RSC. Specifically, we fine-tune a pretrained text-audio multimodal model using free-text descriptions derived from the sound samples' metadata which includes the gender and age of patients, type of recording devices, and recording location on the patient's body. Our method achieves state-of-the-art performance on the ICBHI dataset, surpassing the previous best result by a notable margin of 1.17%. This result validates the effectiveness of leveraging metadata and respiratory sound samples in enhancing RSC performance. Additionally, we investigate the model performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Phonocardiography and Auscultation Techniques · Diverse Musicological Studies
