CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed

TL;DR
CaReAQA is a novel audio-language model that combines medical audio analysis with large language reasoning to improve open-ended diagnostic responses and generalize across datasets.
Contribution
It introduces CaReAQA, a new model integrating audio and language reasoning, and CaReSound, a benchmark dataset for medical audio diagnostic reasoning.
Findings
Achieves 86.2% accuracy on open-ended diagnostic tasks.
Generalizes with 56.9% accuracy on unseen datasets.
Outperforms baseline models in diagnostic reasoning.
Abstract
Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks, outperforming baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Phonocardiography and Auscultation Techniques
