Classification of cancer pathology reports: a large-scale comparative   study

Stefano Martina; Leonardo Ventura; Paolo Frasconi

arXiv:2006.16370·cs.LG·January 12, 2021

Classification of cancer pathology reports: a large-scale comparative study

Stefano Martina, Leonardo Ventura, Paolo Frasconi

PDF

1 Repo

TL;DR

This study applies advanced deep learning models to automatically classify cancer pathology reports into ICD-O3 codes, demonstrating high accuracy and interpretability on a large Italian dataset.

Contribution

It provides a comprehensive comparison of deep learning architectures for cancer report classification, highlighting the effectiveness of flat models and maximum aggregation for interpretability.

Findings

01

Achieved 90.3% accuracy on topography classification

02

Achieved 84.8% accuracy on morphology classification

03

Hierarchical models did not outperform flat models

Abstract

We report about the application of state-of-the-art deep learning techniques to the automatic and interpretable assignment of ICD-O3 topography and morphology codes to free-text cancer reports. We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes (134 morphological classes and 61 topographical classes). We compare alternative architectures in terms of prediction accuracy and interpretability and show that our best model achieves a multiclass accuracy of 90.3% on topography site assignment and 84.8% on morphology type assignment. We found that in this context hierarchical models are not better than flat models and that an element-wise maximum aggregator is slightly better than attentive models on site classification.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trianam/cancerReportsClassification
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability