Hierarchical Deep Learning Ensemble to Automate the Classification of Breast Cancer Pathology Reports by ICD-O Topography
Waheeda Saib, David Sengeh, Gcininwe Dlamini, Elvira Singh

TL;DR
This paper introduces a hierarchical deep learning ensemble that automates the classification of breast cancer pathology reports by ICD-O topography, significantly improving accuracy over existing CNN models and reducing manual coding effort.
Contribution
The study presents a novel hierarchical deep learning ensemble approach that outperforms state-of-the-art CNN models in classifying ICD-O topography codes for breast cancer pathology reports.
Findings
Over 14% improvement in F1 micro score
55% increase in F1 macro score
Enhanced classification accuracy over flat models
Abstract
Like most global cancer registries, the National Cancer Registry in South Africa employs expert human coders to label pathology reports using appropriate International Classification of Disease for Oncology (ICD-O) codes spanning 42 different cancer types. The annotation is extensive for the large volume of cancer pathology reports the registry receives annually from public and private sector institutions. This manual process, coupled with other challenges results in a significant 4-year lag in reporting of annual cancer statistics in South Africa. We present a hierarchical deep learning ensemble method incorporating state of the art convolutional neural network models for the automatic labelling of 2201 de-identified, free text pathology reports, with appropriate ICD-O breast cancer topography codes across 8 classes. Our results show an improvement in primary site classification over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Image Retrieval and Classification Techniques · Biomedical Text Mining and Ontologies
