Cross-modal Prototype Driven Network for Radiology Report Generation

Jun Wang; Abhir Bhalerao; and Yulan He

arXiv:2207.04818·cs.CV·July 12, 2022

Cross-modal Prototype Driven Network for Radiology Report Generation

Jun Wang, Abhir Bhalerao, and Yulan He

PDF

Open Access 1 Repo

TL;DR

This paper introduces XPRONET, a novel cross-modal prototype network that enhances radiology report generation by learning and exploiting cross-modal patterns, significantly improving performance on key benchmarks.

Contribution

The paper proposes a new cross-modal prototype driven network with three modules, advancing feature interaction and learning in radiology report generation.

Findings

01

Outperforms recent methods on IU-Xray benchmark

02

Achieves comparable results on MIMIC-CXR

03

Enhances multi-label prototype learning with contrastive loss

Abstract

Radiology report generation (RRG) aims to describe automatically a radiology image with human-like language and could potentially support the work of radiologists, reducing the burden of manual reporting. Previous approaches often adopt an encoder-decoder architecture and focus on single-modal feature learning, while few studies explore cross-modal feature interaction. Here we propose a Cross-modal PROtotype driven NETwork (XPRONET) to promote cross-modal pattern learning and exploit it to improve the task of radiology report generation. This is achieved by three well-designed, fully differentiable and complementary modules: a shared cross-modal prototype matrix to record the cross-modal prototypes; a cross-modal prototype network to learn the cross-modal prototypes and embed the cross-modal information into the visual and textual features; and an improved multi-label contrastive loss…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

markin-wang/xpronet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Topic Modeling