Grounded Multimodal Retrieval-Augmented Drafting of Radiology Impressions Using Case-Based Similarity Search
Himadri S Samanta

TL;DR
This paper introduces a retrieval-augmented system for generating radiology impressions that ensures factual accuracy and interpretability by combining multimodal embeddings, case-based retrieval, and citation constraints.
Contribution
It presents a novel multimodal retrieval-augmented drafting approach that improves factual grounding and trustworthiness in radiology report generation.
Findings
Recall@5 exceeds 0.95 for relevant findings
Retrieval significantly improves draft factual accuracy
Grounded drafts offer better interpretability and citation traceability
Abstract
Automated radiology report generation has gained increasing attention with the rise of deep learning and large language models. However, fully generative approaches often suffer from hallucinations and lack clinical grounding, limiting their reliability in real-world workflows. In this study, we propose a multimodal retrieval-augmented generation (RAG) system for grounded drafting of chest radiograph impressions. The system combines contrastive image-text embeddings, case-based similarity retrieval, and citation-constrained draft generation to ensure factual alignment with historical radiology reports. A curated subset of the MIMIC-CXR dataset was used to construct a multimodal retrieval database. Image embeddings were generated using CLIP encoders, while textual embeddings were derived from structured impression sections. A fusion similarity framework was implemented using FAISS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
