Facilitating phenotyping from clinical texts: the medkit library
Antoine Neuraz, Ghislain Vaillant, Camila Arias, Olivier Birot,, Kim-Tam Huynh, Thibaut Fabacher, Alice Rogier, Nicolas Garcelon, Ivan Lerner,, Bastien Rance, Adrien Coulet

TL;DR
The paper introduces medkit, an open-source Python library designed to streamline the development, evaluation, and reproducibility of phenotyping pipelines from clinical texts within Electronic Health Records, addressing heterogeneity and complexity issues.
Contribution
It presents a modular, reusable software framework for phenotyping from clinical texts, including pre-built operations and pipelines to facilitate research and collaboration.
Findings
Enables rapid assembly of phenotyping pipelines
Improves reproducibility of phenotyping studies
Reduces time and cost in clinical text analysis
Abstract
Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies. To facilitate the development, evaluation and reproductibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Health Sciences Research and Education
MethodsLib
