MLLM-HWSI: A Multimodal Large Language Model for Hierarchical Whole Slide Image Understanding

Basit Alawode; Arif Mahmood; Muaz Khalifa Al-Radi; Shahad Albastaki; Asim Khan; Muhammad Bilal; Moshira Ali Abdalla; Mohammed Bennamoun; Sajid Javed

arXiv:2603.23067·cs.CV·March 26, 2026

MLLM-HWSI: A Multimodal Large Language Model for Hierarchical Whole Slide Image Understanding

Basit Alawode, Arif Mahmood, Muaz Khalifa Al-Radi, Shahad Albastaki, Asim Khan, Muhammad Bilal, Moshira Ali Abdalla, Mohammed Bennamoun, Sajid Javed

PDF

Open Access

TL;DR

MLLM-HWSI introduces a hierarchical multimodal large language model for whole slide image analysis, aligning visual features at multiple scales with pathology language to improve interpretability and performance across diagnostic tasks.

Contribution

It presents a novel hierarchical WSI-level MLLM that aligns multi-scale visual features with pathology language, enabling interpretable reasoning and state-of-the-art results.

Findings

01

Achieves new SOTA on 13 WSI benchmarks

02

Provides interpretable, evidence-grounded reasoning

03

Supports diverse pathology tasks like VQA and report generation

Abstract

Whole Slide Images (WSIs) exhibit hierarchical structure, where diagnostic information emerges from cellular morphology, regional tissue organization, and global context. Existing Computational Pathology (CPath) Multimodal Large Language Models (MLLMs) typically compress an entire WSI into a single embedding, which hinders fine-grained grounding and ignores how pathologists synthesize evidence across different scales. We introduce \textbf{MLLM-HWSI}, a Hierarchical WSI-level MLLM that aligns visual features with pathology language at four distinct scales, cell as word, patch as phrase, region as sentence, and WSI as paragraph to support interpretable evidence-grounded reasoning. MLLM-HWSI decomposes each WSI into multi-scale embeddings with scale-specific projectors and jointly enforces (i) a hierarchical contrastive objective and (ii) a cross-scale consistency loss, preserving semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Cell Image Analysis Techniques · Digital Imaging for Blood Diseases