QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training
Wei Dai, Peilin Chen, Chanakya Ekbote, Paul Pu Liang

TL;DR
QoQ-Med is a multimodal clinical foundation model that integrates medical images, signals, and text, trained with a novel reinforcement learning method to improve diagnostic accuracy and interpretability across diverse clinical data.
Contribution
Introduces QoQ-Med, the first open multimodal clinical foundation model with domain-aware RL training, addressing data imbalance and enhancing reasoning across modalities.
Findings
Achieves 43% improvement in macro-F1 score over critic-free methods.
Highlights salient regions with IoU 10x higher than open models.
Performs comparably to OpenAI o4-mini on segmentation tasks.
Abstract
Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal language models (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
