Adapting Natural Language Processing Models Across Jurisdictions: A pilot Study in Canadian Cancer Registries
Jonathan Simkin, Lovedeep Gondara, Zeeshan Rizvi, Gregory Doyle, Jeff Dowden, Dan Bond, Desmond Martin, and Raymond Ng

TL;DR
This study evaluates the cross-jurisdiction adaptability of transformer-based NLP models for Canadian cancer registries, demonstrating high performance with modest fine-tuning and ensemble methods to reduce missed cases.
Contribution
First cross-provincial evaluation of transformer models for cancer registry NLP, showing effective adaptation and ensemble strategies to improve sensitivity across jurisdictions.
Findings
Models maintained high performance after cross-jurisdiction adaptation.
Ensemble approach achieved 0.99 recall, reducing missed cancers.
Privacy-preserving workflow enables interoperable NLP infrastructure.
Abstract
Population-based cancer registries depend on pathology reports as their primary diagnostic source, yet manual abstraction is resource-intensive and contributes to delays in cancer data. While transformer-based NLP systems have improved registry workflows, their ability to generalize across jurisdictions with differing reporting conventions remains poorly understood. We present the first cross-provincial evaluation of adapting BCCRTron, a domain-adapted transformer model developed at the British Columbia Cancer Registry, alongside GatorTron, a biomedical transformer model, for cancer surveillance in Canada. Our training dataset consisted of approximately 104,000 and 22,000 de-identified pathology reports from the Newfoundland & Labrador Cancer Registry (NLCR) for Tier 1 (cancer vs. non-cancer) and Tier 2 (reportable vs. non-reportable) tasks, respectively. Both models were fine-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Artificial Intelligence in Healthcare and Education
