MedKCO: Medical Vision-Language Pretraining via Knowledge-Driven Cognitive Orchestration
Chenran Zhang, Ruiqi Wu, Tao Zhou, Yi Zhou

TL;DR
MedKCO introduces a curriculum-based medical vision-language pretraining approach that improves feature representations and generalization across diverse medical imaging tasks by leveraging knowledge-driven data ordering and adaptive contrastive learning.
Contribution
It proposes a novel knowledge-driven curriculum and a self-paced contrastive loss for medical VLP, enhancing model performance and robustness under distribution shifts.
Findings
Significantly outperforms baseline methods across multiple tasks.
Effective in handling distribution shifts in medical imaging.
Demonstrates the benefit of curriculum learning in medical VLP.
Abstract
Medical vision-language pretraining (VLP) models have recently been investigated for their generalization to diverse downstream tasks. However, current medical VLP methods typically force the model to learn simple and complex concepts simultaneously. This anti-cognitive process leads to suboptimal feature representations, especially under distribution shift. To address this limitation, we propose a Knowledge-driven Cognitive Orchestration for Medical VLP (MedKCO) that involves both the ordering of the pretraining data and the learning objective of vision-language contrast. Specifically, we design a two level curriculum by incorporating diagnostic sensitivity and intra-class sample representativeness for the ordering of the pretraining data. Moreover, considering the inter-class similarity of medical images, we introduce a self-paced asymmetric contrastive loss to dynamically adjust the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
