GENIE: Generative Note Information Extraction model for structuring EHR data
Huaiyuan Ying, Hongyi Yuan, Jinsen Lu, Zitian Qu, Yang Zhao, Zhengyun, Zhao, Isaac Kohane, Tianxi Cai, Sheng Yu

TL;DR
GENIE is a novel system leveraging fine-tuned small LLMs to efficiently extract structured information from unstructured clinical notes in EHRs, outperforming traditional methods and enhancing scalability.
Contribution
The paper introduces GENIE, a unified, end-to-end generative model that streamlines EHR note structuring, addressing limitations of existing rule-based and large LLM approaches.
Findings
GENIE achieves high accuracy in extracting entities, assertions, and other attributes.
It outperforms traditional tools like cTAKES and MetaMap in multiple tasks.
The system is scalable and suitable for real-world healthcare applications.
Abstract
Electronic Health Records (EHRs) hold immense potential for advancing healthcare, offering rich, longitudinal data that combines structured information with valuable insights from unstructured clinical notes. However, the unstructured nature of clinical text poses significant challenges for secondary applications. Traditional methods for structuring EHR free-text data, such as rule-based systems and multi-stage pipelines, are often limited by their time-consuming configurations and inability to adapt across clinical notes from diverse healthcare settings. Few systems provide a comprehensive attribute extraction for terminologies. While giant large language models (LLMs) like GPT-4 and LLaMA 405B excel at structuring tasks, they are slow, costly, and impractical for large-scale use. To overcome these limitations, we introduce GENIE, a Generative Note Information Extraction system that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Residual Connection · Multi-Head Attention · Label Smoothing · Layer Normalization · Position-Wise Feed-Forward Layer · Adam · Softmax
