RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Gaia A. Bertolino; Yuwei Zhang; Tong Xia; Domenico Talia; Cecilia Mascolo

arXiv:2603.06542·cs.SD·May 6, 2026

RAMoEA-QA: Hierarchical Specialization for Robust Respiratory Audio Question Answering

Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo

PDF

TL;DR

RAMoEA-QA is a hierarchical respiratory audio question answering model that enhances robustness and specialization across diverse clinical and self-recorded audio data, improving accuracy and transferability.

Contribution

It introduces the first hierarchical QA model supporting input-dependent specialization for heterogeneous respiratory audio data and queries.

Findings

01

Achieves 0.72 accuracy on in-domain discriminative tasks, outperforming baselines.

02

Demonstrates superior regression performance and transfer under dataset, modality, and task shifts.

03

Gains up to 23 percentage points in accuracy on COPD modality shift.

Abstract

Conversational generative AI is increasingly explored in healthcare, where models must integrate heterogeneous patient signals and support diverse interaction styles while producing clinically meaningful outputs. In respiratory care, non-invasive audio recordings captured with sensing devices offer a scalable route to screening and longitudinal monitoring, but heterogeneity is particularly acute: recordings vary across devices, environments, and acquisition protocols, and queries may vary in intent, answer format, and prediction objective. Existing biomedical audio-language question answering systems for respiratory assessment are starting to emerge, but they are typically built as single-path models, processing all inputs through the same acoustic and language pathway despite variation in recording conditions and query types. They are also usually evaluated in relatively limited…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.