Predicting Ki67, ER, PR, and HER2 Statuses from H&E-stained Breast Cancer Images
Amir Akbarnejad, Nilanjan Ray, Penny J. Barnes, Gilbert Bigras

TL;DR
This study demonstrates that machine learning, specifically a ViT-based model, can accurately predict molecular markers like Ki67, ER, PR, and HER2 from H&E-stained breast cancer images with around 90% AUC, using a large, carefully curated dataset.
Contribution
The paper introduces a large-scale, high-quality dataset of H&E and IHC images for breast cancer markers and shows that a ViT-based model can predict these markers with high accuracy from histomorphology.
Findings
Achieved around 90% AUC in predicting molecular markers.
Developed a large, curated dataset of 185,538 images.
Demonstrated the model's ability to localize relevant histological regions.
Abstract
Despite the advances in machine learning and digital pathology, it is not yet clear if machine learning methods can accurately predict molecular information merely from histomorphology. In a quest to answer this question, we built a large-scale dataset (185538 images) with reliable measurements for Ki67, ER, PR, and HER2 statuses. The dataset is composed of mirrored images of H\&E and corresponding images of immunohistochemistry (IHC) assays (Ki67, ER, PR, and HER2. These images are mirrored through registration. To increase reliability, individual pairs were inspected and discarded if artifacts were present (tissue folding, bubbles, etc). Measurements for Ki67, ER and PR were determined by calculating H-Score from image analysis. HER2 measurement is based on binary classification: 0 and 1+ (IHC scores representing a negative subset) vs 3+ (IHC score positive subset). Cases with IHC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Radiomics and Machine Learning in Medical Imaging · Gene expression and cancer classification
