MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration

Chenran Zhang; Ruiqi Wu; Tao Zhou; Yi Zhou

arXiv:2603.09101·cs.CV·March 11, 2026

MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration

Chenran Zhang, Ruiqi Wu, Tao Zhou, Yi Zhou

PDF

Open Access

TL;DR

MedKCO introduces a curriculum-based medical vision-language pretraining approach that improves feature representations and generalization across diverse medical imaging tasks by leveraging knowledge-driven data ordering and adaptive contrastive learning.

Contribution

It proposes a novel knowledge-driven curriculum and a self-paced contrastive loss for medical VLP, enhancing model performance and robustness under distribution shifts.

Findings

01

Significantly outperforms baseline methods across multiple tasks.

02

Effective in handling distribution shifts in medical imaging.

03

Demonstrates the benefit of curriculum learning in medical VLP.

Abstract

Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis