Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports
Ricardo Loor-Torres, Yuqi Wu, Esteban Cabezas, Mariana Borras, David, Toro-Tobon, Mayra Duran, Misk Al Zahidy, Maria Mateo Chavez, Cristian Soto, Jacome, Jungwei W. Fan, Naykky M. Singh Ospina, Yonghui Wu, Juan P. Brito

TL;DR
This study developed ThyroPath, an NLP pipeline that automates extraction and classification of thyroid cancer features from pathology reports, achieving high accuracy and promising large-scale application in clinical settings.
Contribution
The paper introduces ThyroPath, a rule-based NLP system that accurately extracts and classifies thyroid cancer features from both structured and unstructured pathology reports.
Findings
Achieved 93% strict F1-score in extraction from structured reports.
Demonstrated 93% overall accuracy in risk classification.
High accuracy in categorizing risk levels comparable to human experts.
Abstract
Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to 2019. Structured and non-structured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Non-structured reports were narrative, while structured reports followed standardized formats. We then developed ThyroPath, a rule-based NLP pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
