Retrieval-Guided Generation for Safer Histopathology Image Captioning

Md. Enamul Hoq; Wataru Uegami; Saghir Alfasly; Ghazal Alabtah; Sahar Rahimi Malakshan; Armita Kazemi; Alex T. Schmitgen; Fred Prior; H.R. Tizhoosh

arXiv:2605.00893·cs.CV·May 5, 2026

Retrieval-Guided Generation for Safer Histopathology Image Captioning

Md. Enamul Hoq, Wataru Uegami, Saghir Alfasly, Ghazal Alabtah, Sahar Rahimi Malakshan, Armita Kazemi, Alex T. Schmitgen, Fred Prior, H.R. Tizhoosh

PDF

TL;DR

Retrieval-guided generation (RGG) improves the safety and accuracy of histopathology image captioning by summarizing expert text from similar cases, reducing hallucinations and factual errors.

Contribution

This paper introduces retrieval-guided generation for pathology image captioning, enhancing semantic alignment and interpretability over traditional generative models.

Findings

01

RGG achieves cosine similarity of ~0.60 versus ~0.47 from MedGemma.

02

Pathologist review shows better preservation of relevant terminology.

03

Fewer unsupported diagnoses with retrieval-guided captioning.

Abstract

Generative vision-language models can produce fluent medical image captions but remain prone to hallucination, over-specific diagnostic claims, and factual inconsistency-serious issues in pathology. We investigate retrieval-guided generation (RGG) as a safer alternative, where captions are formed by summarizing expert text from visually similar cases rather than generated de novo. On the ARCH histopathology dataset, RGG improves semantic alignment with ground truth, achieving cosine similarity of $\approx$ 0.60 versus $\approx$ 0.47 from MedGemma, with non-overlapping confidence intervals indicating a robust gain. A pathologist-led qualitative review shows better preservation of morphology-relevant terminology and fewer unsupported diagnoses, while revealing failure modes such as concept mixing and inherited over-specific labeling. Overall, retrieval-guided captioning offers a more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.