CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

Tsai-Ning Wang; Lin-Lin Chen; Neil Zeghidour; Aaqib Saeed

arXiv:2505.01199·cs.LG·June 3, 2025

CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning

Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

PDF

Open Access 2 Models

TL;DR

CaReAQA is a novel audio-language model that combines medical audio analysis with large language reasoning to improve open-ended diagnostic responses and generalize across datasets.

Contribution

It introduces CaReAQA, a new model integrating audio and language reasoning, and CaReSound, a benchmark dataset for medical audio diagnostic reasoning.

Findings

01

Achieves 86.2% accuracy on open-ended diagnostic tasks.

02

Generalizes with 56.9% accuracy on unseen datasets.

03

Outperforms baseline models in diagnostic reasoning.

Abstract

Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks, outperforming baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Phonocardiography and Auscultation Techniques