Enhancing Clinical Data Warehouses with Provenance and Large File Management: The gitOmmix Approach for Clinical Omics Data
Maxime Wack (CRC, HeKA, HEGP, CHNO), Adrien Coulet (CRC, HeKA), Anita, Burgun (HEGP, Imagine), Bastien Rance (UPCit\'e, HEGP, CRC, HeKA)

TL;DR
gitOmmix enhances clinical data warehouses by integrating large file management and provenance tracking, enabling detailed, traceable documentation of medical omics data and analyses, thus improving data reuse and longitudinal study capabilities.
Contribution
It introduces gitOmmix, a novel approach combining git, git-annex, and PROV-O to support large file management and provenance in clinical data warehouses.
Findings
Supports tracing data back to patient samples.
Enables querying and browsing of provenance relationships.
Scales to large files and is system-agnostic.
Abstract
Background. Clinical data warehouses (CDWs) are essential in the reuse of hospital data in observational studies or predictive modeling. However, state of-the-art CDW systems present two drawbacks. First, they do not support the management of large data files, what is critical in medical genomics, radiology, digital pathology, and other domains where such files are generated. Second, they do not provide provenance management or means to represent longitudinal relationships between patient events. Indeed, a disease diagnosis and its follow-up rely on multiple analyses. In these cases no relationship between the data (e.g., a large file) and its associated analysis and decision can be documented.Method. We introduce gitOmmix, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. gitOmmix relies on (i) a file versioning system:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
