Align, Reason and Learn: Enhancing Medical Vision-and-Language   Pre-training with Knowledge

Zhihong Chen; Guanbin Li; Xiang Wan

arXiv:2209.07118·cs.CL·September 16, 2022

Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge

Zhihong Chen, Guanbin Li, Xiang Wan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a knowledge-enhanced medical vision-and-language pre-training method that aligns, reasons, and learns from structured medical knowledge, significantly improving performance on multiple downstream tasks.

Contribution

It systematically incorporates structured medical knowledge into Med-VLP by aligning representations, enabling reasoning, and emphasizing critical information, which was lacking in prior methods.

Findings

01

Achieves state-of-the-art results on all downstream tasks.

02

Effectively integrates medical knowledge into vision-and-language models.

03

Provides a comprehensive benchmark for future research.

Abstract

Medical vision-and-language pre-training (Med-VLP) has received considerable attention owing to its applicability to extracting generic vision-and-language representations from medical images and texts. Most existing methods mainly contain three elements: uni-modal encoders (i.e., a vision encoder and a language encoder), a multi-modal fusion module, and pretext tasks, with few studies considering the importance of medical domain expert knowledge and explicitly exploiting such knowledge to facilitate Med-VLP. Although there exist knowledge-enhanced vision-and-language pre-training (VLP) methods in the general domain, most require off-the-shelf toolkits (e.g., object detectors and scene graph parsers), which are unavailable in the medical domain. In this paper, we propose a systematic and effective approach to enhance Med-VLP by structured medical knowledge from three perspectives.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhjohnchan/arl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

MethodsALIGN