Exploiting Low-Dimensional Manifold of Features for Few-Shot Whole Slide Image Classification
Conghao Xiong, Zhengrui Guo, Zhe Xu, Yifei Zhang, Raymond Kai-Yu Tong, Si Yong Yeo, Hao Chen, Joseph J. Y. Sung, Irwin King

TL;DR
This paper introduces a geometry-aware residual module called the Manifold Residual (MR) block, which preserves the low-dimensional feature manifold in few-shot Whole Slide Image classification, leading to improved performance with fewer parameters.
Contribution
The paper proposes the MR block, a novel plug-and-play module that maintains feature manifold geometry and enhances few-shot WSI classification.
Findings
Achieves state-of-the-art results on few-shot WSI tasks.
Reduces model complexity with fewer parameters.
Demonstrates the importance of geometry-aware modules in medical imaging.
Abstract
Few-shot Whole Slide Image (WSI) classification is severely hampered by overfitting. We argue that this is not merely a data-scarcity issue but a fundamentally geometric problem. Grounded in the manifold hypothesis, our analysis shows that features from pathology foundation models exhibit a low-dimensional manifold geometry that is easily perturbed by downstream models. This insight reveals a key potential issue in downstream multiple instance learning models: linear layers are geometry-agnostic and, as we show empirically, can distort the manifold geometry of the features. To address this, we propose the Manifold Residual (MR) block, a plug-and-play module that is explicitly geometry-aware. The MR block reframes the linear layer as residual learning and decouples it into two pathways: (1) a fixed, random matrix serving as a geometric anchor that approximately preserves topology while…
Peer Reviews
Decision·ICLR 2026 Poster
- The authors propose a relevant analysis to emphasise low-dimensional manifold properties of a range of foundation models for pathology. - Propose a novel layer for few-shot MIL, the MR block with a custom training strategy. - They provide theoretical results on a range of geometric/statistical properties preserved by perturbations by random matrices. - Demonstrate a universality approximation theorem for the MR block. - Show on 3 datasets that the MR blocks, instead of linear layers, within 5
- **W1 : clarity** There are several points in the paper that would benefit from clarification and/or further detail: - a) L63: "linear layer". For people knowing the MIL literature it is not clear at this stage, about which linear layers you are referring to, e.g those included in the gated-attention layer of ABMIL or actually the linear classifier at the end of the architecture, which can also have an effect. This should be clarified. - b) The dataset used for the geometric studies repor
1. The paper identifies a real and practically significant issue in computational pathology. The connection between feature geometry and data efficiency is conceptually interesting and relevant to current efforts in adapting large pretrained models for medical imaging. 2. The proposed MR block is lightweight, easy to implement, and compatible with a wide range of MIL backbones. It can be viewed as a structured parameter-efficient adapter. 3. The paper reports consistent accuracy gains across m
Major: 1. The paper attributes few-shot overfitting to the “destruction” of pretrained feature manifolds by downstream linear layers. This interpretation is not entirely convincing. Linear mappings are expected to reshape representations to achieve class separability, which is the very purpose of a classifier. The observed overfitting could instead result from limited data or excessive model capacity rather than geometric distortion. The causal link between ‘destruction’ and overfitting is not
1) The study provides quantitative proof that CONCH features exhibit a low-dimensional manifold with nonlinear curvature, which linear layers disrupt. 2) The study proposes **MR Block Innovation** with a fixed random geometric anchor and a trainable low-rank residual pathway, reducing overfitting and parameter count. 3) The study provides a extensive validation to demonstrates the generalization of the proposed method. 4) **MR Block Innovation** demonstrates SOTA performances on three dataset
1) The study does not provide comparison with SOTA methods for whole slide images classification in few-shot settings such as MGPATH [3], MSCPT [2], FOCUS [3]. 2) The study does not report inference time and FLOPs for the proposed method. 3) The study does not fully explain the effective of rank on the model's performance. For example, the sensitivity analysis (Fig. 3) shows that performance saturates around a rank of **r=32**. The authors note this **aligns remarkably** with the features effe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications
