Evaluating few-shot prompting for spectrogram-based lung sound classification using a multimodal language model

Nicholas Dietrich; David McShannon; Mark F. Rzepka

PMC · DOI:10.1371/journal.pdig.0001179·January 7, 2026

Evaluating few-shot prompting for spectrogram-based lung sound classification using a multimodal language model

Nicholas Dietrich, David McShannon, Mark F. Rzepka

PDF

Open Access

TL;DR

This study explores using a multimodal AI model, GPT-4o, to classify lung sounds from spectrograms, finding that providing a few examples improves performance slightly.

Contribution

Demonstrates that few-shot prompting improves lung sound classification performance using a general-purpose multimodal LLM.

Findings

01

Few-shot prompting improved accuracy (0.363 vs. 0.320) and other metrics over zero-shot prompting.

02

Model repeatability was high (κ = 0.76–0.88), indicating strong consistency.

03

Performance gains were statistically significant (p < 0.001) but insufficient for clinical use.

Abstract

Traditional deep learning models for lung sound analysis require large, labeled datasets, whereas multimodal large language models (LLMs) may offer a flexible, prompt-based alternative. This study aimed to evaluate the utility of a general-purpose multimodal LLM, GPT-4o, for lung sound classification from mel-spectrograms and assess whether a few-shot prompt approach improves performance over zero-shot prompting. Using the ICBHI 2017 Respiratory Sound Database, 6898 annotated respiratory cycles were converted into mel-spectrograms. GPT-4o was prompted to classify each spectrogram using both zero-shot and few-shot strategies. Model outputs were evaluated against ground truth labels using performance metrics including accuracy, precision, recall, and F1-score. Few-shot prompting improved overall accuracy (0.363 vs. 0.320) and yielded modest gains in precision (0.316 vs. 0.283), recall…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

GPT-4o

Diseases3

LLMs hallucinations Crackles

Figures6

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonocardiography and Auscultation Techniques · COVID-19 diagnosis using AI · Machine Learning in Healthcare