Cluster-First Labelling: An Automated Pipeline for Segmentation and Morphological Clustering in Histology Whole Slide Images
Muhammad Haseeb Ahmad, Sharmila Rajendran, Damion Young, Jon Mason

TL;DR
This paper introduces an automated, cloud-native pipeline for histology image segmentation and clustering, significantly reducing manual annotation effort by grouping similar tissue structures for labeling.
Contribution
The authors present a novel end-to-end pipeline combining segmentation, feature extraction, dimensionality reduction, and clustering, enabling efficient tissue component annotation in WSIs.
Findings
Achieved 96.8% accuracy in cluster-label alignment across diverse tissue types.
Reduced annotation effort by enabling labeling of clusters instead of individual objects.
Successfully applied to 3,696 tissue components from 13 tissue types across three species.
Abstract
Labelling tissue components in histology whole slide images (WSIs) is prohibitively labour-intensive: a single slide may contain tens of thousands of structures--cells, nuclei, and other morphologically distinct objects--each requiring manual boundary delineation and classification. We present a cloudnative, end-to-end pipeline that automates this process through a cluster-first paradigm. Our system tiles WSIs, filters out tiles deemed unlikely to contain valuable information, segments tissue components with Cellpose-SAM (including cells, nuclei, and other morphologically similar structures), extracts neural embeddings via a pretrained ResNet-50, reduces dimensionality with UMAP, and groups morphologically similar objects using DBSCAN clustering. Under this paradigm, a human annotator labels representative clusters rather than individual objects, reducing annotation effort by orders of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
