Spatial Blindness in Whole-Slide Multiple Instance Learning
Xiangyu Li, Ran Su

TL;DR
This paper identifies a spatial blindness issue in whole-slide MIL models where spatial information is ignored, and proposes ResTopoMIL, a simple architecture that improves spatial sensitivity and localization.
Contribution
The paper reveals the spatial blindness problem in MIL models and introduces ResTopoMIL, which enhances spatial awareness with minimal complexity.
Findings
ResTopoMIL improves classification and survival prediction across 9 benchmarks.
It restores sensitivity to coordinate perturbations.
It provides stronger localization evidence on CAMELYON-16.
Abstract
Whole-slide MIL models are often called context-aware once graphs, Transform ers, or state-space modules are placed above patch embeddings. We show that this label can be deceptive. On pathology tasks where tissue architecture is part of the diagnostic signal, several strong MIL baselines retain nearly unchanged slide level AUC after patch coordinates are permuted. Their predictions are accurate, but largely compositional. We refer to this failure mode as spatial blindness. Our explanation is optimization-based: dense appearance statistics are learned early under slide-level supervision, leaving weak gradients for sparse spatial relations. ResTopoMIL addresses the issue by first fitting a permutation-invariant prototype histogram and then freezing it while a lightweight graph branch learns the residual under a coordinate-shuffling constraint. The architecture is simple by design; the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
