Developing a Portable Natural Language Processing Based Phenotyping System
Himanshu Sharma, Chengsheng Mao, Yizhen Zhang, Haleh Vatani, Liang, Yao, Yizhen Zhong, Luke Rasmussen, Guoqian Jiang, Jyotishman Pathak, Yuan, Luo

TL;DR
This paper introduces a portable NLP-based phenotyping system that integrates rule-based and machine learning methods, utilizing UMLS and OMOP CDM for standardization and adaptability across institutions.
Contribution
The system uniquely combines rule-based and statistical approaches with standard data models, enabling portability and reuse of clinical NLP components across different healthcare settings.
Findings
Achieved top 10 performance in obesity phenotyping challenge
Standardized data extraction using UMLS and OMOP CDM
Facilitated reuse and extension of rule-based NLP systems
Abstract
This paper presents a portable phenotyping system that is capable of integrating both rule-based and statistical machine learning based approaches. Our system utilizes UMLS to extract clinically relevant features from the unstructured text and then facilitates portability across different institutions and data systems by incorporating OHDSI's OMOP Common Data Model (CDM) to standardize necessary data elements. Our system can also store the key components of rule-based systems (e.g., regular expression matches) in the format of OMOP CDM, thus enabling the reuse, adaptation and extension of many existing rule-based clinical NLP systems. We experimented with our system on the corpus from i2b2's Obesity Challenge as a pilot study. Our system facilitates portable phenotyping of obesity and its 15 comorbidities based on the unstructured patient discharge summaries, while achieving a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques · Topic Modeling
