Libra-MIL: Multimodal Prototypes Stereoscopic Infused with Task-specific Language Priors for Few-shot Whole Slide Image Classification
Zhenfeng Zhuang, Fangyu Zhou, Liansheng Wang

TL;DR
Libra-MIL introduces a multimodal prototype approach with task-specific language priors and stereoscopic optimal transport to improve few-shot whole slide image classification and interpretability in computational pathology.
Contribution
The paper presents a novel multimodal prototype-based MIL method that enables bidirectional cross-modal interaction and leverages LLM-generated descriptions for better generalization.
Findings
Outperforms existing methods in few-shot classification accuracy.
Enhances interpretability through semantic alignment of prototypes.
Demonstrates superior generalization across multiple cancer datasets.
Abstract
While Large Language Models (LLMs) are emerging as a promising direction in computational pathology, the substantial computational cost of giga-pixel Whole Slide Images (WSIs) necessitates the use of Multi-Instance Learning (MIL) to enable effective modeling. A key challenge is that pathological tasks typically provide only bag-level labels, while instance-level descriptions generated by LLMs often suffer from bias due to a lack of fine-grained medical knowledge. To address this, we propose that constructing task-specific pathological entity prototypes is crucial for learning generalizable features and enhancing model interpretability. Furthermore, existing vision-language MIL methods often employ unidirectional guidance, limiting cross-modal synergy. In this paper, we introduce a novel approach, Multimodal Prototype-based Multi-Instance Learning, that promotes bidirectional interaction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAI in cancer detection · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
