Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation

Difei Gu; Yunhe Gao; Mu Zhou; Dimitris Metaxas

arXiv:2511.08402·cs.CV·November 12, 2025

Anatomy-VLM: A Fine-grained Vision-Language Model for Medical Interpretation

Difei Gu, Yunhe Gao, Mu Zhou, Dimitris Metaxas

PDF

Open Access

TL;DR

Anatomy-VLM is a novel fine-grained vision-language model designed for medical interpretation, integrating anatomical details and structured knowledge to improve disease diagnosis and enable zero-shot interpretation.

Contribution

It introduces a multi-scale, anatomy-aware model that localizes key features and aligns medical information for enhanced interpretability and diagnostic accuracy.

Findings

01

Achieves high performance on in- and out-of-distribution datasets.

02

Improves downstream image segmentation tasks.

03

Enables zero-shot anatomy-wise interpretation.

Abstract

Accurate disease interpretation from radiology remains challenging due to imaging heterogeneity. Achieving expert-level diagnostic decisions requires integration of subtle image features with clinical knowledge. Yet major vision-language models (VLMs) treat images as holistic entities and overlook fine-grained image details that are vital for disease diagnosis. Clinicians analyze images by utilizing their prior medical knowledge and identify anatomical structures as important region of interests (ROIs). Inspired from this human-centric workflow, we introduce Anatomy-VLM, a fine-grained, vision-language model that incorporates multi-scale information. First, we design a model encoder to localize key anatomical features from entire medical images. Second, these regions are enriched with structured knowledge for contextually-aware interpretation. Finally, the model encoder aligns…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Machine Learning in Healthcare