On the Importance of Image Encoding in Automated Chest X-Ray Report Generation
Otabek Nazarov, Mohammad Yaqub, Karthik Nandakumar

TL;DR
This paper investigates how different image encoding methods impact automated chest X-ray report generation, highlighting that fine-grained encoding significantly improves both NLP and clinical accuracy over other approaches.
Contribution
The study introduces and evaluates four image encoding strategies, including a novel Cluster-CLIP encoder, demonstrating the critical role of encoding quality in report generation performance.
Findings
Fine-grained encoding outperforms other methods in NLP and clinical metrics.
CLIP-based encoders achieve comparable NLP results to CNN encoders.
Cluster-CLIP provides more discriminative and explainable representations.
Abstract
Chest X-ray is one of the most popular medical imaging modalities due to its accessibility and effectiveness. However, there is a chronic shortage of well-trained radiologists who can interpret these images and diagnose the patient's condition. Therefore, automated radiology report generation can be a very helpful tool in clinical practice. A typical report generation workflow consists of two main steps: (i) encoding the image into a latent space and (ii) generating the text of the report based on the latent image embedding. Many existing report generation techniques use a standard convolutional neural network (CNN) architecture for image encoding followed by a Transformer-based decoder for medical text generation. In most cases, CNN and the decoder are trained jointly in an end-to-end fashion. In this work, we primarily focus on understanding the relative importance of encoder and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Topic Modeling · Lung Cancer Diagnosis and Treatment
MethodsContrastive Language-Image Pre-training
