Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology
Gabriele Campanella, Vitor Werneck Krauss Silva, Thomas J. Fuchs

TL;DR
This paper introduces a large-scale deep learning framework for pathology that leverages a massive dataset and multiple instance learning to accurately diagnose prostate cancer from slides without pixel-level annotations.
Contribution
The study presents the first terabyte-scale pathology dataset and demonstrates a deep MIL approach achieving high accuracy in prostate cancer diagnosis.
Findings
Achieved an AUC of 0.98 on test set
Utilized a dataset 25 times larger than ImageNet
Enabled scalable training without pixel-wise labels
Abstract
In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Digital Imaging for Blood Diseases · Cell Image Analysis Techniques
