ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis
Lubin Gan, Jing Zhang, Heng Zhang, Xin Di, Zhifeng Wang, Wenke Huang, Xiaoyan Sun

TL;DR
ReconMIL introduces a novel framework for whole slide image analysis that combines domain-specific feature projection with a bi-stream architecture to improve diagnostic localization and reduce background noise.
Contribution
It proposes a Latent Space Reconstruction module and a bi-stream architecture with a scale-adaptive mechanism to enhance feature relevance and localization in WSI analysis.
Findings
Outperforms state-of-the-art methods on multiple benchmarks.
Effectively localizes fine-grained diagnostic regions.
Balances global context with local details.
Abstract
Whole slide image (WSI) analysis heavily relies on multiple instance learning (MIL). While recent methods benefit from large-scale foundation models and advanced sequence modeling to capture long-range dependencies, they still struggle with two critical issues. First, directly applying frozen, task-agnostic features often leads to suboptimal separability due to the domain gap with specific histological tasks. Second, relying solely on global aggregators can cause over-smoothing, where sparse but critical diagnostic signals are overshadowed by the dominant background context. In this paper, we present ReconMIL, a novel framework designed to bridge this domain gap and balance global-local feature aggregation. Our approach introduces a Latent Space Reconstruction module that adaptively projects generic features into a compact, task-specific manifold, improving boundary delineation. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCell Image Analysis Techniques · AI in cancer detection · Advanced Image and Video Retrieval Techniques
