Generating Radiology Reports via Memory-driven Transformer
Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan

TL;DR
This paper introduces a memory-driven Transformer model for automatic radiology report generation, significantly improving report quality and clinical relevance on major datasets, including the first results on MIMIC-CXR.
Contribution
It presents a novel memory-driven Transformer architecture with relational memory and memory-driven normalization for improved medical report generation.
Findings
Outperforms previous models on IU X-Ray and MIMIC-CXR datasets
Able to generate long, detailed reports with medical terms
Produces meaningful image-text attention mappings
Abstract
Medical imaging is frequently used in clinical practice and trials for diagnosis and treatment. Writing imaging reports is time-consuming and can be error-prone for inexperienced radiologists. Therefore, automatically generating radiology reports is highly desired to lighten the workload of radiologists and accordingly promote clinical automation, which is an essential task to apply artificial intelligence to the medical domain. In this paper, we propose to generate radiology reports with memory-driven Transformer, where a relational memory is designed to record key information of the generation process and a memory-driven conditional layer normalization is applied to incorporating the memory into the decoder of Transformer. Experimental results on two prevailing radiology report datasets, IU X-Ray and MIMIC-CXR, show that our proposed approach outperforms previous models with respect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Multi-Head Attention · Attention Is All You Need · Residual Connection · Dropout · Adam · Byte Pair Encoding · Softmax
