Lizard: A Large-Scale Dataset for Colonic Nuclear Instance Segmentation   and Classification

Simon Graham; Mostafa Jahanifar; Ayesha Azam; Mohammed Nimir; Yee-Wah; Tsang; Katherine Dodd; Emily Hero; Harvir Sahota; Atisha Tank; Ksenija Benes,; Noorul Wahab; Fayyaz Minhas; Shan E Ahmed Raza; Hesham El Daly; Kishore; Gopalakrishnan; David Snead; Nasir Rajpoot

arXiv:2108.11195·cs.CV·November 30, 2021

Lizard: A Large-Scale Dataset for Colonic Nuclear Instance Segmentation and Classification

Simon Graham, Mostafa Jahanifar, Ayesha Azam, Mohammed Nimir, Yee-Wah, Tsang, Katherine Dodd, Emily Hero, Harvir Sahota, Atisha Tank, Ksenija Benes,, Noorul Wahab, Fayyaz Minhas, Shan E Ahmed Raza, Hesham El Daly, Kishore, Gopalakrishnan, David Snead, Nasir Rajpoot

PDF

TL;DR

This paper introduces Lizard, a large-scale dataset for colonic nuclear segmentation and classification, created through a multi-stage annotation pipeline with pathologist-in-the-loop refinement, to advance computational pathology models.

Contribution

The paper presents the creation of the largest nuclear segmentation and classification dataset for colon tissue, utilizing a novel multi-stage annotation process with expert input.

Findings

01

Largest dataset with nearly 500,000 labeled nuclei

02

Effective multi-stage annotation pipeline with expert refinement

03

Facilitates development of improved pathology models

Abstract

The development of deep segmentation models for computational pathology (CPath) can help foster the investigation of interpretable morphological biomarkers. Yet, there is a major bottleneck in the success of such approaches because supervised deep learning models require an abundance of accurately labelled data. This issue is exacerbated in the field of CPath because the generation of detailed annotations usually demands the input of a pathologist to be able to distinguish between different tissue constructs and nuclei. Manually labelling nuclei may not be a feasible approach for collecting large-scale annotated datasets, especially when a single image region can contain thousands of different cells. However, solely relying on automatic generation of annotations will limit the accuracy and reliability of ground truth. Therefore, to help overcome the above challenges, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.