Lung and Colon Cancer Histopathological Image Dataset (LC25000)
Andrew A. Borkowski, Marilyn M. Bui, L. Brannon Thomas, Catherine P., Wilson, Lauren A. DeLand, Stephen M. Mastorides

TL;DR
The paper introduces LC25000, a large, diverse, and validated histopathological image dataset of 25,000 images across five cancer-related classes, aimed at advancing machine learning in medical diagnosis.
Contribution
It provides a new, publicly available, ML-ready dataset of histopathological images for lung and colon cancers, filling a critical gap in medical image datasets.
Findings
Dataset contains 25,000 images across 5 classes.
Images are de-identified and HIPAA compliant.
Dataset is validated and freely available.
Abstract
The field of Machine Learning, a subset of Artificial Intelligence, has led to remarkable advancements in many areas, including medicine. Machine Learning algorithms require large datasets to train computer models successfully. Although there are medical image datasets available, more image datasets are needed from a variety of medical entities, especially cancer pathology. Even more scarce are ML-ready image datasets. To address this need, we created an image dataset (LC25000) with 25,000 color images in 5 classes. Each class contains 5,000 images of the following histologic entities: colon adenocarcinoma, benign colonic tissue, lung adenocarcinoma, lung squamous cell carcinoma, and benign lung tissue. All images are de-identified, HIPAA compliant, validated, and freely available for download to AI researchers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Artificial Intelligence in Healthcare and Education
