Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study
Honghan Wu, Karen Hodgson, Sue Dyson, Katherine I. Morley, Zina M., Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow

TL;DR
This study introduces a phenotype embedding method to efficiently reuse NLP models for phenotype-mention identification in free-text medical records, reducing validation and retraining efforts while maintaining high accuracy.
Contribution
The paper proposes a novel phenotype embedding approach that minimizes waste in model reuse without requiring labeled data from new settings.
Findings
Achieves up to 76% identification of phenotype mentions without validation.
Maintains 93-97% accuracy in new tasks.
Reduces effort in model adaptation by around 80%.
Abstract
Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. Objective: The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. Methods: We formally define and analyse the model adaptation problem in phenotype-mention identification tasks. We identify "duplicate waste" and "imbalance waste", which collectively impede efficient model reuse. We propose a phenotype embedding based approach to minimize these sources of waste without the need for labelled data from new settings. Results:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
