Integrating Health Care Data in an Informatics for Integrating Biology & the Bedside (i2b2) Model Persisted Through Elasticsearch: Design, Implementation, and Evaluation in a French University Hospital
Romain Griffier, Fleur Mougin, Vianney Jouhet

TL;DR
This paper describes how using Elasticsearch instead of a relational database improves the performance of the i2b2 model for health data analysis in a large hospital.
Contribution
The paper introduces adaptations to the i2b2 model for Elasticsearch, enabling efficient query performance and reduced storage needs.
Findings
Elasticsearch outperforms relational databases in query execution times, especially for free-text searches.
Elasticsearch requires less disk space compared to indexed relational databases.
The implementation is now in production at Bordeaux University Hospital.
Abstract
The volume of digital data in health care is continually growing. In addition to its use in health care, the health data collected can also serve secondary purposes, such as research. In this context, clinical data warehouses (CDWs) provide the infrastructure and organization necessary to enhance the secondary use of health data. Various data models have been proposed for structuring data in a CDW, including the Informatics for Integrating Biology & the Bedside (i2b2) model, which relies on a relational database. However, this persistence approach can lead to performance issues when executing queries on massive data sets. This study aims to describe the necessary transformations and their implementation to enable i2b2’s search engine to perform the phenotyping task using data persistence in a NoSQL Elasticsearch database. This study compares data persistence in a standard relational…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Electronic Health Records Systems · Machine Learning in Healthcare
