Development and Validation of MicrobEx: an Open-Source Package for Microbiology Culture Concept Extraction
Garrett Eickelberg, Yuan Luo, L. Nelson Sanchez-Pinto

TL;DR
MicrobEx is an open-source NLP tool that accurately extracts microbiology culture information from free-text reports, facilitating secondary clinical data use and supporting various healthcare applications.
Contribution
This paper introduces MicrobEx, a rule-based NLP package that reliably extracts culture positivity and bacteria identification from microbiology reports across multiple institutions.
Findings
F-1 scores >0.95 on classification tasks
Validated across four institutions
High accuracy in extracting microbiology data
Abstract
Microbiology culture reports contain critical information for important clinical and public health applications. However, microbiology reports often have complex, semi-structured, free-text data that present a barrier for secondary use. Here we present the development and validation of an open-source package designed to ingest free-text microbiology reports, determine whether the culture is positive, and return a list of SNOMED-CT mapped bacteria. Our rule-based natural language processing algorithm was developed using microbiology reports from two different electronic health record systems in a large healthcare organization, and then externally validated on the reports of two other institutions with manually-extracted results as a benchmark. Our algorithm achieved F-1 scores >0.95 on all classification tasks across both validation sets. Our concept extraction Python package, MicrobEx,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBacterial Identification and Susceptibility Testing · Biomedical Text Mining and Ontologies · Topic Modeling
