Evaluating few-shot prompting for spectrogram-based lung sound classification using a multimodal language model
Nicholas Dietrich, David McShannon, Mark F. Rzepka

TL;DR
This study explores using a multimodal AI model, GPT-4o, to classify lung sounds from spectrograms, finding that providing a few examples improves performance slightly.
Contribution
Demonstrates that few-shot prompting improves lung sound classification performance using a general-purpose multimodal LLM.
Findings
Few-shot prompting improved accuracy (0.363 vs. 0.320) and other metrics over zero-shot prompting.
Model repeatability was high (κ = 0.76–0.88), indicating strong consistency.
Performance gains were statistically significant (p < 0.001) but insufficient for clinical use.
Abstract
Traditional deep learning models for lung sound analysis require large, labeled datasets, whereas multimodal large language models (LLMs) may offer a flexible, prompt-based alternative. This study aimed to evaluate the utility of a general-purpose multimodal LLM, GPT-4o, for lung sound classification from mel-spectrograms and assess whether a few-shot prompt approach improves performance over zero-shot prompting. Using the ICBHI 2017 Respiratory Sound Database, 6898 annotated respiratory cycles were converted into mel-spectrograms. GPT-4o was prompted to classify each spectrogram using both zero-shot and few-shot strategies. Model outputs were evaluated against ground truth labels using performance metrics including accuracy, precision, recall, and F1-score. Few-shot prompting improved overall accuracy (0.363 vs. 0.320) and yielded modest gains in precision (0.316 vs. 0.283), recall…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonocardiography and Auscultation Techniques · COVID-19 diagnosis using AI · Machine Learning in Healthcare
