Anatomical Structure-Guided Medical Vision-Language Pre-training

Qingqiu Li; Xiaohan Yan; Jilan Xu; Runtian Yuan; Yuejie Zhang; Rui; Feng; Quanli Shen; Xiaobo Zhang; Shujun Wang

arXiv:2403.09294·cs.CV·March 15, 2024·1 cites

Anatomical Structure-Guided Medical Vision-Language Pre-training

Qingqiu Li, Xiaohan Yan, Jilan Xu, Runtian Yuan, Yuejie Zhang, Rui, Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang

PDF

Open Access

TL;DR

This paper introduces an Anatomical Structure-Guided framework for medical vision-language pre-training that enhances interpretability and clinical relevance by leveraging anatomical parsing and fine-grained alignment.

Contribution

It proposes a novel anatomical structure-guided approach with report parsing, anatomical region-sentence alignment, and image-tag recognition to improve medical visual representation learning.

Findings

01

Outperforms state-of-the-art methods on five benchmarks.

02

Enhances local interpretability and semantic alignment.

03

Improves downstream task performance.

Abstract

Learning medical visual representations through vision-language pre-training has reached remarkable progress. Despite the promising performance, it still faces challenges, i.e., local alignment lacks interpretability and clinical relevance, and the insufficient internal and external representation learning of image-report pairs. To address these issues, we propose an Anatomical Structure-Guided (ASG) framework. Specifically, we parse raw reports into triplets <anatomical region, finding, existence>, and fully utilize each element as supervision to enhance representation learning. For anatomical region, we design an automatic anatomical region-sentence alignment paradigm in collaboration with radiologists, considering them as the minimum semantic units to explore fine-grained local alignment. For finding and existence, we regard them as image tags, applying an image-tag recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical and Biological Sciences

MethodsContrastive Learning