TL;DR
THUNDER is a comprehensive benchmark for evaluating tile-level digital pathology models, focusing on performance, feature analysis, robustness, and uncertainty across diverse datasets and tasks.
Contribution
It introduces a dynamic, easy-to-use benchmark for comparing 23 foundation models on multiple datasets, emphasizing robustness and uncertainty in digital pathology.
Findings
23 models evaluated across 16 datasets
Insights into feature spaces and robustness
Benchmark supports diverse downstream tasks
Abstract
Progress in a research field can be hard to assess, in particular when many concurrent methods are proposed in a short period of time. This is the case in digital pathology, where many foundation models have been released recently to serve as feature extractors for tile-level images, being used in a variety of downstream tasks, both for tile- and slide-level problems. Benchmarking available methods then becomes paramount to get a clearer view of the research landscape. In particular, in critical domains such as healthcare, a benchmark should not only focus on evaluating downstream performance, but also provide insights about the main differences between methods, and importantly, further consider uncertainty and robustness to ensure a reliable usage of proposed models. For these reasons, we introduce THUNDER, a tile-level benchmark for digital pathology foundation models, allowing for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
