Diversity Over Scale: Whole-Slide Image Variety Enables H&E Foundation Model Training with Fewer Patches
Christoph Bosch, John K.L. Wong, Martin Paulikat, Myroslav Zapukhlyak, Bharti Arora, Manasi Aichm\"uller-Ratnaparkhe, Jens Baumann, Shivani Karn, Rutuja Kamble, Swapnil Karnik, Bhushan Khedkar, Serey Vathana Chhut, Witali Aswolinskiy, Christian Aichm\"uller

TL;DR
This study demonstrates that increasing data diversity across whole-slide images enables training effective histopathology foundation models with fewer patches, challenging the notion that larger datasets are always better.
Contribution
We introduce Athena, a histopathology foundation model trained on fewer patches by emphasizing data diversity over volume, achieving competitive performance.
Findings
Athena approaches state-of-the-art on patch-level benchmarks.
Athena surpasses models trained on larger datasets on slide-level tasks.
Diversity across whole-slide images is crucial for effective model training.
Abstract
Rapid progress in computational pathology is increasingly driven by vision foundation models pretrained on vast histopathology datasets. While recent efforts have prioritized training on an ever-larger amount of patches, we take an alternative approach focused on data diversity. Our foundation model, Athena, was initialized from a pretrained model and trained on just 115 million tissue patches, several times fewer than recent histopathology foundation models. Rather than relying on patch volume or complex sampling heuristics, we maximize data diversity by randomly selecting only a moderate number of patches per whole-slide image from our diverse internal repository, which spans multiple countries, institutions, and scanner types. Evaluated on a single patch-level benchmark and four slide-level downstream tasks (two molecular and two morphological), Athena approaches the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Cell Image Analysis Techniques · Digital Imaging for Blood Diseases
