TL;DR
This paper introduces a new benchmark and metric for evaluating histopathology foundation models in skin cancer subtyping, demonstrating that less biased feature extraction improves classification in multi-center datasets.
Contribution
It presents a novel benchmark and the FM-SI metric for assessing histopathology foundation models in a real-world, multi-center setting.
Findings
Less biased feature extraction improves classification accuracy.
The FM-SI metric effectively measures model consistency under distribution shifts.
Benchmarking reveals variability in model performance across datasets.
Abstract
Pretraining on large-scale, in-domain datasets grants histopathology foundation models (FM) the ability to learn task-agnostic data representations, enhancing transfer learning on downstream tasks. In computational pathology, automated whole slide image analysis requires multiple instance learning (MIL) frameworks due to the gigapixel scale of the slides. The diversity among histopathology FMs has highlighted the need to design real-world challenges for evaluating their effectiveness. To bridge this gap, our work presents a novel benchmark for evaluating histopathology FMs as patch-level feature extractors within a MIL classification framework. For that purpose, we leverage the AI4SkIN dataset, a multi-center cohort encompassing slides with challenging cutaneous spindle cell neoplasm subtypes. We also define the Foundation Model - Silhouette Index (FM-SI), a novel metric to measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
