Prompt as Knowledge Bank: Boost Vision-language model via Structural   Representation for zero-shot medical detection

Yuguang Yang; Tongfei Chen; Haoyu Huang; Linlin Yang; Chunyu Xie,; Dawei Leng; Xianbin Cao; Baochang Zhang

arXiv:2502.16223·cs.CV·February 25, 2025

Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection

Yuguang Yang, Tongfei Chen, Haoyu Huang, Linlin Yang, Chunyu Xie,, Dawei Leng, Xianbin Cao, Baochang Zhang

PDF

Open Access

TL;DR

This paper introduces StructuralGLIP, a novel approach that enhances zero-shot medical detection by encoding prompts into a knowledge bank for more precise image-text alignment, significantly improving detection accuracy.

Contribution

The paper proposes a new method that encodes prompts into a layered knowledge bank, enabling fine-grained, context-aware alignment between images and disease descriptions for zero-shot detection.

Findings

01

Achieves +4.1% AP improvement over state-of-the-art in seven benchmarks.

02

Improves fine-tuned models by +3.2% AP on endoscopy datasets.

03

Demonstrates effective structural representation for medical image-text alignment.

Abstract

Zero-shot medical detection can further improve detection performance without relying on annotated medical images even upon the fine-tuned model, showing great clinical value. Recent studies leverage grounded vision-language models (GLIP) to achieve this by using detailed disease descriptions as prompts for the target disease name during the inference phase. However, these methods typically treat prompts as equivalent context to the target name, making it difficult to assign specific disease knowledge based on visual information, leading to a coarse alignment between images and target descriptions. In this paper, we propose StructuralGLIP, which introduces an auxiliary branch to encode prompts into a latent knowledge bank layer-by-layer, enabling more context-aware and fine-grained alignment. Specifically, in each layer, we select highly similar features from both the image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling