Enriched Annotations for Tumor Attribute Classification from Pathology Reports with Limited Labeled Data
Nick Altieri, Briton Park, Mara Olson, John DeNero, Anobel Odisho, Bin, Yu

TL;DR
This paper introduces a novel annotation scheme and algorithm that improve tumor attribute classification from pathology reports using limited labeled data, reducing annotation effort and achieving high accuracy with fewer labels.
Contribution
The authors develop an enriched hierarchical annotation scheme and the Supervised Line Attention algorithm, enabling effective tumor attribute classification with less labeled data and annotation time.
Findings
SLA achieves similar or better accuracy with half the labeled data compared to state-of-the-art methods.
Enriched annotations increase annotation time by 20%, but reduce overall annotation effort by 40%.
The method is effective on small datasets of 32 to 186 documents.
Abstract
Precision medicine has the potential to revolutionize healthcare, but much of the data for patients is locked away in unstructured free-text, limiting research and delivery of effective personalized treatments. Generating large annotated datasets for information extraction from clinical notes is often challenging and expensive due to the high level of expertise needed for high quality annotations. To enable natural language processing for small dataset sizes, we develop a novel enriched hierarchical annotation scheme and algorithm, Supervised Line Attention (SLA), and apply this algorithm to predicting categorical tumor attributes from kidney and colon cancer pathology reports from the University of California San Francisco (UCSF). Whereas previous work only annotated document level labels, we in addition ask the annotators to enrich the traditional label by asking them to also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
