Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

TL;DR
This paper introduces PathGenIC, a multimodal in-context learning framework that dynamically retrieves similar histopathology image-report pairs to generate accurate medical reports, achieving state-of-the-art results on the HistGen benchmark.
Contribution
The paper presents a novel in-context learning approach that integrates dynamic retrieval and adaptive feedback for histopathology report generation, improving over existing methods.
Findings
Achieves state-of-the-art BLEU, METEOR, ROUGE-L scores.
Demonstrates robustness across report lengths and disease categories.
Effectively bridges vision and language in medical report generation.
Abstract
Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynamically retrieves semantically similar whole slide image (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality. Evaluated on the HistGen benchmark, the framework achieves state-of-the-art results, with significant improvements across BLEU, METEOR, and ROUGE-L metrics, and demonstrates robustness across diverse report lengths and disease categories. By maximizing training data utility and bridging vision and language with ICL, our work offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI in cancer detection · Generative Adversarial Networks and Image Synthesis
