EIR: Enhanced Image Representations for Medical Report Generation
Qiang Sun, Zongcheng Ji, Yinlong Xiao, Peng Chang, Jun Yu

TL;DR
This paper introduces EIR, a novel method that fuses metadata with medical images using cross-modal transformers and domain-specific pre-trained models to improve automatic chest X-ray report generation.
Contribution
The paper proposes a new approach that effectively addresses information asymmetry and domain gap issues in medical report generation using advanced fusion and encoding techniques.
Findings
Improved report accuracy on MIMIC and Open-I datasets.
Effective fusion of metadata and image representations.
Bridging the domain gap with medical domain pre-trained models.
Abstract
Generating medical reports from chest X-ray images is a critical and time-consuming task for radiologists, especially in emergencies. To alleviate the stress on radiologists and reduce the risk of misdiagnosis, numerous research efforts have been dedicated to automatic medical report generation in recent years. Most recent studies have developed methods that represent images by utilizing various medical metadata, such as the clinical document history of the current patient and the medical graphs constructed from retrieved reports of other similar patients. However, all existing methods integrate additional metadata representations with visual representations through a simple "Add and LayerNorm" operation, which suffers from the information asymmetry problem due to the distinct distributions between them. In addition, chest X-ray images are usually represented using pre-trained models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Topic Modeling
