MedP-CLIP: Medical CLIP with Region-Aware Prompt Integration

Jiahui Peng; He Yao; Jingwen Li; Yanzhou Su; Sibo Ju; Yujie Lu; Jin Ye; Hongchun Lu; Xue Li; Lincheng Jiang; Min Zhu; Junlong Cheng

arXiv:2604.11197·cs.CV·April 14, 2026

MedP-CLIP: Medical CLIP with Region-Aware Prompt Integration

Jiahui Peng, He Yao, Jingwen Li, Yanzhou Su, Sibo Ju, Yujie Lu, Jin Ye, Hongchun Lu, Xue Li, Lincheng Jiang, Min Zhu, Junlong Cheng

PDF

TL;DR

MedP-CLIP is a region-aware medical vision-language model that enhances fine-grained understanding of anatomical regions in medical images through region prompt integration and large-scale pre-training.

Contribution

It introduces a novel feature-level region prompt mechanism and integrates medical prior knowledge, enabling flexible regional focus while maintaining global context in medical image analysis.

Findings

01

Outperforms baseline methods in zero-shot recognition, segmentation, and multimodal tasks.

02

Pre-trained on over 6.4 million images with 97.3 million annotations.

03

Demonstrates significant improvements in medical image understanding and regional analysis.

Abstract

Contrastive Language-Image Pre-training (CLIP) has demonstrated outstanding performance in global image understanding and zero-shot transfer through large-scale text-image alignment. However, the core of medical image analysis often lies in the fine-grained understanding of specific anatomical structures or lesion regions. Therefore, precisely comprehending region-of-interest (RoI) information provided by medical professionals or perception models becomes crucial. To address this need, we propose MedP-CLIP, a region-aware medical vision-language model (VLM). MedP-CLIP innovatively integrates medical prior knowledge and designs a feature-level region prompt integration mechanism, enabling it to flexibly respond to various prompt forms (e.g., points, bounding boxes, masks) while maintaining global contextual awareness when focusing on local regions. We pre-train the model on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.