MedMO: Grounding and Understanding Multimodal Large Language Model for Medical Images
Ankan Deria, Komal Kumar, Adinath Madhavrao Dukre, Eran Segal, Salman Khan, Imran Razzak

TL;DR
MedMO is a specialized multimodal large language model designed for medical images, trained on domain-specific data, and capable of improved reasoning, grounding, and performance across various medical tasks and modalities.
Contribution
The paper introduces MedMO, a novel medical multimodal foundation model with multi-stage training, domain-specific data, and reinforcement learning for grounded reasoning, surpassing existing medical baselines.
Findings
MedMO-8B-Next improves VQA benchmarks by 6.6% on average.
MedMO enhances medical report generation by 6.7%.
MedMO achieves 56.1 IoU on Bacteria grounding task.
Abstract
Multimodal large language models have advanced rapidly, but their adoption in medicine is constrained by limited domain coverage, imperfect modality alignment, and insufficient grounded reasoning. We introduce MedMO, a medical multimodal foundation model built on a general MLLM architecture and trained exclusively on large-scale domain-specific data. MedMO uses a multi-stage training recipe that includes cross-modal pretraining to align heterogeneous visual encoders with a medical language backbone, instruction tuning with multi-task supervision spanning captioning, VQA, report generation, retrieval, and bounding-box disease localization, and reinforcement learning with verifiable rewards that combine factuality checks with a box-level GIoU signal to improve spatial grounding and step-by-step reasoning in challenging clinical settings. Across modalities and tasks, MedMO surpasses strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
