A Robust BERT-Based Deep Learning Model for Automated Cancer Type Extraction from Unstructured Pathology Reports
Minh Tran, Jeffery C. Chan, Min Li Huang, Maya Kansara, John P. Grady, Christine E. Napier, Subotheni Thavaneswaran, Mandy L. Ballinger, David M. Thomas, Frank P. Lin

TL;DR
This paper presents a robust BERT-based deep learning system that accurately extracts cancer types from unstructured pathology reports, supporting precision oncology research with high performance and scalability.
Contribution
It introduces a fine-tuned RoBERTa model specifically designed for extracting cancer types, outperforming baseline models and large language models in clinical text analysis.
Findings
Achieved F1 score of 0.98 with the model.
Exact match rate of 80.61%.
Demonstrated scalability for clinical integration.
Abstract
The accurate extraction of clinical information from electronic medical records is particularly critical to clinical research but require much trained expertise and manual labor. In this study we developed a robust system for automated extraction of the specific cancer types for the purpose of supporting precision oncology research. from pathology reports using a fine-tuned RoBERTa model. This model significantly outperformed the baseline model and a Large Language Model, Mistral 7B, achieving F1_Bertscore 0.98 and overall exact match of 80.61%. This fine-tuning approach demonstrates the potential for scalability that can integrate seamlessly into the molecular tumour board process. Fine-tuning domain-specific models for precision tasks in oncology, may pave the way for more efficient and accurate clinical information extraction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · AI in cancer detection
