Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

Yuxuan Yang; Zhonghao Yan; Yi Zhang; Bo Yun; Muxi Diao; Guowei Zhao; Kongming Liang; Wenbin Li; Zhanyu Ma

arXiv:2602.19424·cs.CV·March 3, 2026

Hepato-LLaVA: An Expert MLLM with Sparse Topo-Pack Attention for Hepatocellular Pathology Analysis on Whole Slide Images

Yuxuan Yang, Zhonghao Yan, Yi Zhang, Bo Yun, Muxi Diao, Guowei Zhao, Kongming Liang, Wenbin Li, Zhanyu Ma

PDF

Open Access

TL;DR

Hepato-LLaVA is a specialized multi-modal large language model with a novel sparse attention mechanism that models tissue topology for improved hepatocellular pathology analysis on gigapixel whole slide images, supported by a new large dataset.

Contribution

We introduce Hepato-LLaVA, a multi-modal LLM with a sparse topology-aware attention mechanism and a new dataset, HepatoPathoVQA, for hepatocellular pathology analysis.

Findings

01

Achieves state-of-the-art performance on HCC diagnosis

02

Outperforms existing methods in captioning tasks

03

Effectively models tissue topology with sparse attention

Abstract

Hepatocellular Carcinoma diagnosis relies heavily on the interpretation of gigapixel Whole Slide Images. However, current computational approaches are constrained by fixed-resolution processing mechanisms and inefficient feature aggregation, which inevitably lead to either severe information loss or high feature redundancy. To address these challenges, we propose Hepato-LLaVA, a specialized Multi-modal Large Language Model designed for fine-grained hepatocellular pathology analysis. We introduce a novel Sparse Topo-Pack Attention mechanism that explicitly models 2D tissue topology. This mechanism effectively aggregates local diagnostic evidence into semantic summary tokens while preserving global context. Furthermore, to overcome the lack of multi-scale data, we present HepatoPathoVQA, a clinically grounded dataset comprising 33K hierarchically structured question-answer pairs validated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning