Efficiently Reusing Natural Language Processing Models for   Phenotype-Mention Identification in Free-text Electronic Medical Records:   Methodology Study

Honghan Wu; Karen Hodgson; Sue Dyson; Katherine I. Morley; Zina M.; Ibrahim; Ehtesham Iqbal; Robert Stewart; Richard JB Dobson; Cathie Sudlow

arXiv:1903.03995·cs.CL·October 25, 2019

Efficiently Reusing Natural Language Processing Models for Phenotype-Mention Identification in Free-text Electronic Medical Records: Methodology Study

Honghan Wu, Karen Hodgson, Sue Dyson, Katherine I. Morley, Zina M., Ibrahim, Ehtesham Iqbal, Robert Stewart, Richard JB Dobson, Cathie Sudlow

PDF

TL;DR

This study introduces a phenotype embedding method to efficiently reuse NLP models for phenotype-mention identification in free-text medical records, reducing validation and retraining efforts while maintaining high accuracy.

Contribution

The paper proposes a novel phenotype embedding approach that minimizes waste in model reuse without requiring labeled data from new settings.

Findings

01

Achieves up to 76% identification of phenotype mentions without validation.

02

Maintains 93-97% accuracy in new tasks.

03

Reduces effort in model adaptation by around 80%.

Abstract

Background: Many efforts have been put into the use of automated approaches, such as natural language processing (NLP), to mine or extract data from free-text medical records to construct comprehensive patient profiles for delivering better health-care. Reusing NLP models in new settings, however, remains cumbersome - requiring validation and/or retraining on new data iteratively to achieve convergent results. Objective: The aim of this work is to minimize the effort involved in reusing NLP models on free-text medical records. Methods: We formally define and analyse the model adaptation problem in phenotype-mention identification tasks. We identify "duplicate waste" and "imbalance waste", which collectively impede efficient model reuse. We propose a phenotype embedding based approach to minimize these sources of waste without the need for labelled data from new settings. Results:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.