Retrieval-Guided Generation for Safer Histopathology Image Captioning
Md. Enamul Hoq, Wataru Uegami, Saghir Alfasly, Ghazal Alabtah, Sahar Rahimi Malakshan, Armita Kazemi, Alex T. Schmitgen, Fred Prior, H.R. Tizhoosh

TL;DR
Retrieval-guided generation (RGG) improves the safety and accuracy of histopathology image captioning by summarizing expert text from similar cases, reducing hallucinations and factual errors.
Contribution
This paper introduces retrieval-guided generation for pathology image captioning, enhancing semantic alignment and interpretability over traditional generative models.
Findings
RGG achieves cosine similarity of ~0.60 versus ~0.47 from MedGemma.
Pathologist review shows better preservation of relevant terminology.
Fewer unsupported diagnoses with retrieval-guided captioning.
Abstract
Generative vision-language models can produce fluent medical image captions but remain prone to hallucination, over-specific diagnostic claims, and factual inconsistency-serious issues in pathology. We investigate retrieval-guided generation (RGG) as a safer alternative, where captions are formed by summarizing expert text from visually similar cases rather than generated de novo. On the ARCH histopathology dataset, RGG improves semantic alignment with ground truth, achieving cosine similarity of 0.60 versus 0.47 from MedGemma, with non-overlapping confidence intervals indicating a robust gain. A pathologist-led qualitative review shows better preservation of morphology-relevant terminology and fewer unsupported diagnoses, while revealing failure modes such as concept mixing and inherited over-specific labeling. Overall, retrieval-guided captioning offers a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
