CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging

Pooja Singh; Siddhant Ujjain; Tapan Kumar Gandhi; Sandeep Kumar

arXiv:2511.11034·cs.CV·November 17, 2025

CrossMed: A Multimodal Cross-Task Benchmark for Compositional Generalization in Medical Imaging

Pooja Singh, Siddhant Ujjain, Tapan Kumar Gandhi, Sandeep Kumar

PDF

Open Access

TL;DR

CrossMed introduces a comprehensive benchmark to evaluate the ability of multimodal medical AI models to generalize compositionally across unseen combinations of imaging modalities, anatomical structures, and tasks using a unified VQA format.

Contribution

This work presents CrossMed, a novel benchmark reformulating multiple medical imaging datasets into a VQA format to assess compositional generalization in multimodal LLMs.

Findings

01

Models perform well on related splits but struggle with unrelated and zero-overlap splits.

02

Cross-task transfer improves segmentation performance by 7% cIoU.

03

Multimodal LLMs excel at compositional generalization compared to traditional models.

Abstract

Recent advances in multimodal large language models have enabled unified processing of visual and textual inputs, offering promising applications in general-purpose medical AI. However, their ability to generalize compositionally across unseen combinations of imaging modality, anatomy, and task type remains underexplored. We introduce CrossMed, a benchmark designed to evaluate compositional generalization (CG) in medical multimodal LLMs using a structured Modality-Anatomy-Task (MAT) schema. CrossMed reformulates four public datasets, CheXpert (X-ray classification), SIIM-ACR (X-ray segmentation), BraTS 2020 (MRI classification and segmentation), and MosMedData (CT classification) into a unified visual question answering (VQA) format, resulting in 20,200 multiple-choice QA instances. We evaluate two open-source multimodal LLMs, LLaVA-Vicuna-7B and Qwen2-VL-7B, on both Related and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling