SCALPEL3: a scalable open-source library for healthcare claims databases

Emmanuel Bacry; St\'ephane Ga\"iffas; Fanny Leroy; Maryan Morel; Dinh; Phong Nguyen; Youcef Sebiat; Dian Sun

arXiv:1910.07045·cs.DC·August 27, 2020

SCALPEL3: a scalable open-source library for healthcare claims databases

Emmanuel Bacry, St\'ephane Ga\"iffas, Fanny Leroy, Maryan Morel, Dinh, Phong Nguyen, Youcef Sebiat, Dian Sun

PDF

3 Repos

TL;DR

SCALPEL3 is a scalable, open-source framework built on Apache Spark that streamlines the analysis of large healthcare claims databases, enabling efficient data extraction, manipulation, and machine learning integration.

Contribution

It introduces a modular, scalable library suite for healthcare claims data analysis, improving efficiency and reproducibility over previous methods.

Findings

01

Successfully processed 14.5 million patients' data in under 49 minutes

02

Enabled complex concept extraction for large observational studies

03

Provided interactive tools for cohort analysis and data flow monitoring

Abstract

This article introduces SCALPEL3, a scalable open-source framework for studies involving Large Observational Databases (LODs). Its design eases medical observational studies thanks to abstractions allowing concept extraction, high-level cohort manipulation, and production of data formats compatible with machine learning libraries. SCALPEL3 has successfully been used on the SNDS database (see Tuppin et al. (2017)), a huge healthcare claims database that handles the reimbursement of almost all French citizens. SCALPEL3 focuses on scalability, easy interactive analysis and helpers for data flow analysis to accelerate studies performed on LODs. It consists of three open-source libraries based on Apache Spark. SCALPEL-Flattening allows denormalization of the LOD (only SNDS for now) by joining tables sequentially in a big table. SCALPEL-Extraction provides fast concept extraction from a big…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.