STARC-9: A Large-scale Dataset for Multi-Class Tissue Classification for CRC Histopathology
Barathi Subramanian, Rathinaraja Jeyaraj, Mitchell Nevin Peterson, Terry Guo, Nigam Shah, Curtis Langlotz, Andrew Y. Ng, Jeanne Shen

TL;DR
STARC-9 is a large, diverse, and high-quality dataset of CRC histopathology images designed to improve multi-class tissue classification models, addressing limitations of existing datasets.
Contribution
The paper introduces STARC-9, a novel large-scale dataset with a semi-automated curation framework, enhancing diversity and quality for CRC tissue classification.
Findings
Models trained on STARC-9 outperform those trained on previous datasets.
The dataset improves model generalizability across different architectures.
DeepCluster++ effectively ensures intra-class diversity and reduces manual effort.
Abstract
Multi-class tissue-type classification of colorectal cancer (CRC) histopathologic images is a significant step in the development of downstream machine learning models for diagnosis and treatment planning. However, existing public CRC datasets often lack morphologic diversity, suffer from class imbalance, and contain low-quality image tiles, limiting model performance and generalizability. To address these issues, we introduce STARC-9 (STAnford coloRectal Cancer), a large-scale dataset for multi-class tissue classification. STARC-9 contains 630,000 hematoxylin and eosin-stained image tiles uniformly sampled across nine clinically relevant tissue classes (70,000 tiles per class) from 200 CRC patients at the Stanford University School of Medicine. The dataset was built using a novel framework, DeepCluster++, designed to ensure intra-class diversity and reduce manual curation. First, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAI in cancer detection · Digital Imaging for Blood Diseases · Medical Imaging and Analysis
