A multi-modal vision-language model for generalizable annotation-free pathology localization

Hao Yang; Hong-Yu Zhou; Jiarun Liu; Weijian Huang; Cheng Li; Zhihuan Li; Yuanxu Gao; Qiegen Liu; Yong Liang; Qi Yang; Song Wu; Tao Tan; Hairong Zheng; Kang Zhang; and Shanshan Wang

arXiv:2401.02044·cs.CV·November 11, 2025·1 cites

A multi-modal vision-language model for generalizable annotation-free pathology localization

Hao Yang, Hong-Yu Zhou, Jiarun Liu, Weijian Huang, Cheng Li, Zhihuan Li, Yuanxu Gao, Qiegen Liu, Yong Liang, Qi Yang, Song Wu, Tao Tan, Hairong Zheng, Kang Zhang, and Shanshan Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces AFLoc, a multi-modal vision-language model that achieves generalizable, annotation-free pathology localization across various medical imaging modalities, outperforming existing methods and even surpassing human benchmarks.

Contribution

The paper presents a novel contrastive learning framework that aligns multi-granularity medical concepts with image features, enabling pathology localization without expert annotations.

Findings

01

Outperforms state-of-the-art in localization and classification

02

Demonstrates strong generalization across modalities

03

Surpasses human benchmarks in pathology localization

Abstract

Existing deep learning models for defining pathology from clinical imaging data rely on expert annotations and lack generalization capabilities in open clinical environments. Here, we present a generalizable vision-language model for Annotation-Free pathology Localization (AFLoc). The core strength of AFLoc is extensive multi-level semantic structure-based contrastive learning, which comprehensively aligns multi-granularity medical concepts with abundant image features to adapt to the diverse expressions of pathologies without the reliance on expert image annotations. We conduct primary experiments on a dataset of 220K pairs of image-report chest X-ray images and perform validation across eight external datasets encompassing 34 types of chest pathologies. The results demonstrate that AFLoc outperforms state-of-the-art methods in both annotation-free localization and classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yh0517/afloc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Digital Imaging for Blood Diseases