Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao; Jianpeng Zhang; Zhongyi Shui; Sinuo Wang; Zeli Chen; Xi Li; Le Lu; Xianghua Ye; Tingbo Liang; Qi Zhang; Ling Zhang

arXiv:2508.03742·eess.IV·August 7, 2025

Boosting Vision Semantic Density with Anatomy Normality Modeling for Medical Vision-language Pre-training

Weiwei Cao, Jianpeng Zhang, Zhongyi Shui, Sinuo Wang, Zeli Chen, Xi Li, Le Lu, Xianghua Ye, Tingbo Liang, Qi Zhang, Ling Zhang

PDF

TL;DR

This paper introduces a novel approach to medical vision-language pre-training that enhances semantic density by modeling normal anatomy and differentiating abnormal signals, leading to improved diagnostic accuracy and state-of-the-art zero-shot performance.

Contribution

The paper proposes a new method combining disease-level contrastive learning and anatomical normality modeling using VQ-VAE to improve alignment between medical images and reports.

Findings

01

Achieved an average AUC of 84.9% across 54 diseases in multiple organs.

02

Surpassed existing methods in zero-shot diagnosis performance.

03

Demonstrated superior transfer learning capabilities.

Abstract

Vision-language pre-training (VLP) has great potential for developing multifunctional and general medical diagnostic capabilities. However, aligning medical images with a low signal-to-noise ratio (SNR) to reports with a high SNR presents a semantic density gap, leading to visual alignment bias. In this paper, we propose boosting vision semantic density to improve alignment effectiveness. On one hand, we enhance visual semantics through disease-level vision contrastive learning, which strengthens the model's ability to differentiate between normal and abnormal samples for each anatomical structure. On the other hand, we introduce an anatomical normality modeling method to model the distribution of normal samples for each anatomy, leveraging VQ-VAE for reconstructing normal vision embeddings in the latent space. This process amplifies abnormal signals by leveraging distribution shifts in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.