ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images

Hongyu Ge; Longkun Hao; Zihui Xu; Zhenxin Lin; Bin Li; Shoujun Zhou; Hongjin Zhao; Yihang Liu

arXiv:2502.05928·cs.CV·July 14, 2025

ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images

Hongyu Ge, Longkun Hao, Zihui Xu, Zhenxin Lin, Bin Li, Shoujun Zhou, Hongjin Zhao, Yihang Liu

PDF

Open Access 1 Repo

TL;DR

ClinKD is a novel framework that improves multimodal large language models for medical visual question answering by enhancing image-text alignment and medical knowledge transfer, leading to state-of-the-art results.

Contribution

Introduces ClinKD, a cross-modal knowledge distillation framework that addresses image-text misalignment and domain knowledge gaps in medical VQA tasks.

Findings

01

Achieves state-of-the-art performance on challenging Med-VQA datasets.

02

Significantly improves image-text alignment in medical multimodal models.

03

Enables better medical knowledge adaptation in large language models.

Abstract

Medical Visual Question Answering (Med-VQA) represents a critical and challenging subtask within the general VQA domain. Despite significant advancements in general VQA, multimodal large language models (MLLMs) still exhibit substantial limitations when handling multi-task VQA scenarios. These limitations manifest through erroneous spatial localization and misinterpretation of medical images, which primarily arise from two fundamental issues: inadequate image-text alignment and insufficient domain-specified knowledge for medical applications. To address these issues, we introduce the Cross-Modal Clinical Knowledge Distiller (ClinKD), an innovative framework designed to enhance image-text alignment and establish more effective medical knowledge transformation mechanisms, which enables MLLMs to perform better even when lacking prior medical knowledge. Our extensive experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

overloadedhenry/clinkd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection