Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu; Hsuan-Yu Fan; Wei-Ta Chu; Fu-En Yang; Yu-Chiang Frank Wang

arXiv:2506.17645·cs.CV·June 24, 2025

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

PDF

Open Access

TL;DR

This paper introduces PathGenIC, a multimodal in-context learning framework that dynamically retrieves similar histopathology image-report pairs to generate accurate medical reports, achieving state-of-the-art results on the HistGen benchmark.

Contribution

The paper presents a novel in-context learning approach that integrates dynamic retrieval and adaptive feedback for histopathology report generation, improving over existing methods.

Findings

01

Achieves state-of-the-art BLEU, METEOR, ROUGE-L scores.

02

Demonstrates robustness across report lengths and disease categories.

03

Effectively bridges vision and language in medical report generation.

Abstract

Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI in cancer detection · Generative Adversarial Networks and Image Synthesis