Textual Inversion and Self-supervised Refinement for Radiology Report   Generation

Yuanjiang Luo; Hongxiang Li; Xuan Wu; Meng Cao; Xiaoshuang Huang,; Zhihong Zhu; Peixi Liao; Hu Chen; Yi Zhang

arXiv:2405.20607·cs.CV·June 7, 2024

Textual Inversion and Self-supervised Refinement for Radiology Report Generation

Yuanjiang Luo, Hongxiang Li, Xuan Wu, Meng Cao, Xiaoshuang Huang,, Zhihong Zhu, Peixi Liao, Hu Chen, Yi Zhang

PDF

Open Access

TL;DR

This paper introduces TISR, a novel method combining textual inversion and self-supervised refinement to improve radiology report generation by bridging modality gaps and enhancing report fidelity.

Contribution

The proposed TISR method effectively addresses modality gap and report content constraints, offering a plug-and-play solution that improves report quality.

Findings

01

Significant performance improvements on public datasets

02

Effective reduction of modality gap in report generation

03

Enhanced report fidelity through contrastive learning

Abstract

Existing mainstream approaches follow the encoder-decoder paradigm for generating radiology reports. They focus on improving the network structure of encoders and decoders, which leads to two shortcomings: overlooking the modality gap and ignoring report content constraints. In this paper, we proposed Textual Inversion and Self-supervised Refinement (TISR) to address the above two issues. Specifically, textual inversion can project text and image into the same space by representing images as pseudo words to eliminate the cross-modeling gap. Subsequently, self-supervised refinement refines these pseudo words through contrastive loss computation between images and texts, enhancing the fidelity of generated reports to images. Notably, TISR is orthogonal to most existing methods, plug-and-play. We conduct experiments on two widely-used public datasets and achieve significant improvements on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies

MethodsFocus