PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through   Multi-agent Collaboration

Yuxuan Sun; Yunlong Zhang; Yixuan Si; Chenglu Zhu; Zhongyi Shui; Kai; Zhang; Jingxiong Li; Xingheng Lyu; Tao Lin; Lin Yang

arXiv:2407.00203·cs.CV·July 2, 2024·3 cites

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

Yuxuan Sun, Yunlong Zhang, Yixuan Si, Chenglu Zhu, Zhongyi Shui, Kai, Zhang, Jingxiong Li, Xingheng Lyu, Tao Lin, Lin Yang

PDF

Open Access 1 Repo 3 Models 4 Datasets

TL;DR

This paper introduces PathGen-1.6M, a large-scale dataset of 1.6 million high-quality pathology image-text pairs generated via multi-agent collaboration, significantly improving pathology image analysis and enabling advanced multimodal models.

Contribution

The authors develop a scalable method to generate high-quality pathology image-caption pairs using multi-agent collaboration, enhancing pathology-specific vision-language models and instruction-tuned multimodal systems.

Findings

01

PathGen-CLIP outperforms existing models on nine zero-shot classification tasks.

02

Generated dataset improves pathology image analysis accuracy.

03

Instruction-tuned models demonstrate enhanced multimodal understanding.

Abstract

Vision Language Models (VLMs) like CLIP have attracted substantial attention in pathology, serving as backbones for applications such as zero-shot image classification and Whole Slide Image (WSI) analysis. Additionally, they can function as vision encoders when combined with large language models (LLMs) to support broader capabilities. Current efforts to train pathology VLMs rely on pathology image-text pairs from platforms like PubMed, YouTube, and Twitter, which provide limited, unscalable data with generally suboptimal image quality. In this work, we leverage large-scale WSI datasets like TCGA to extract numerous high-quality image patches. We then train a large multimodal model to generate captions for these images, creating PathGen-1.6M, a dataset containing 1.6 million high-quality image-caption pairs. Our approach involves multiple agent models collaborating to extract…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dazhangyu123/acmil
pytorch

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · AI in cancer detection

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training