QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Wei Dai; Peilin Chen; Chanakya Ekbote; Paul Pu Liang

arXiv:2506.00711·cs.LG·October 23, 2025·3 cites

QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training

Wei Dai, Peilin Chen, Chanakya Ekbote, Paul Pu Liang

PDF

Open Access 4 Models

TL;DR

QoQ-Med is a multimodal clinical foundation model that integrates medical images, signals, and text, trained with a novel reinforcement learning method to improve diagnostic accuracy and interpretability across diverse clinical data.

Contribution

Introduces QoQ-Med, the first open multimodal clinical foundation model with domain-aware RL training, addressing data imbalance and enhancing reasoning across modalities.

Findings

01

Achieves 43% improvement in macro-F1 score over critic-free methods.

02

Highlights salient regions with IoU 10x higher than open models.

03

Performs comparably to OpenAI o4-mini on segmentation tasks.

Abstract

Clinical decision-making routinely demands reasoning over heterogeneous data, yet existing multimodal language models (MLLMs) remain largely vision-centric and fail to generalize across clinical specialties. To bridge this gap, we introduce QoQ-Med-7B/32B, the first open generalist clinical foundation model that jointly reasons across medical images, time-series signals, and text reports. QoQ-Med is trained with Domain-aware Relative Policy Optimization (DRPO), a novel reinforcement-learning objective that hierarchically scales normalized rewards according to domain rarity and modality difficulty, mitigating performance imbalance caused by skewed clinical data distributions. Trained on 2.61 million instruction tuning pairs spanning 9 clinical domains, we show that DRPO training boosts diagnostic performance by 43% in macro-F1 on average across all visual domains as compared to other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies