On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

Ruben T. Lucassen; Tijn van de Luijtgaarden; Sander P.J. Moonemans; Gerben E. Breimer; Willeke A.M. Blokx; Mitko Veta

arXiv:2502.19285·cs.CV·June 9, 2025

On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation

Ruben T. Lucassen, Tijn van de Luijtgaarden, Sander P.J. Moonemans, Gerben E. Breimer, Willeke A.M. Blokx, Mitko Veta

PDF

Open Access

TL;DR

This study examines how preprocessing pathology reports to include only relevant image-derived information improves the accuracy and reduces hallucinations in multimodal pathology report generation, highlighting the importance of text selection.

Contribution

It demonstrates that text preprocessing enhances report quality and reduces hallucinations, providing insights into optimal data preparation for vision-language models in pathology.

Findings

01

Preprocessed reports prevent hallucinated sentences in generated reports.

02

Training on full reports improves cross-modal retrieval performance.

03

Preprocessing enhances report quality despite lower retrieval accuracy.

Abstract

Vision-language models in pathology enable multimodal case retrieval and automated report generation. Many of the models developed so far, however, have been trained on pathology reports that include information which cannot be inferred from paired whole slide images (e.g., patient history), potentially leading to hallucinated sentences in generated reports. To this end, we investigate how the selection of information from pathology reports for vision-language modeling affects the quality of the multimodal representations and generated reports. More concretely, we compare a model trained on full reports against a model trained on preprocessed reports that only include sentences describing the cell and tissue appearances based on the H&E-stained slides. For the experiments, we built upon the BLIP-2 framework and used a cutaneous melanocytic lesion dataset of 42,433 H&E-stained whole…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI in cancer detection · Domain Adaptation and Few-Shot Learning