LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

Qingqiao Hu; Weimin Lyu; Meilong Xu; Kehan Qi; Xiaoling Hu; Saumya Gupta; Jiawei Zhou; Chao Chen

arXiv:2512.05391·cs.CV·March 13, 2026

LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models

Qingqiao Hu, Weimin Lyu, Meilong Xu, Kehan Qi, Xiaoling Hu, Saumya Gupta, Jiawei Zhou, Chao Chen

PDF

Open Access

TL;DR

LoC-Path introduces a resource-efficient multimodal large language model for pathology that compresses gigapixel slide features, reducing computational costs while maintaining competitive performance.

Contribution

The paper proposes LoC-Path, a novel architecture that compresses slide features using sparse token merging and importance scoring, enabling efficient end-to-end pathology modeling.

Findings

01

LoC-Path reduces inference latency and memory usage significantly.

02

It maintains competitive accuracy compared to existing slide-level MLLMs.

03

The approach enables practical deployment under limited computational resources.

Abstract

Whole Slide Image (WSI) MLLMs are difficult to build and deploy because gigapixel slides induce thousands of visual tokens, while only a small fraction of regions is diagnostically relevant. Existing slide-level pathology MLLMs typically combine heavy slide-level encoders with long visual prefixes, making end-to-end slide-level development and deployment expensive under limited computational resources. We revisit this regime and show that WSI tile features are highly redundant at both global and local scales, while task-relevant evidence is sparse and query-dependent. We therefore introduce LoC-Path, a resource-efficient slide-level MLLM that compresses before fusion. LoC-Path uses a Sparse Token Merger (STM) and an MAE-pretrained resampler to replace expensive slide-level encoding with a compact latent interface, then uses a Token Importance Scorer (TIS) to select the most relevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning