Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data
Robert L. Grossman

TL;DR
This paper reviews various platforms for managing, analyzing, and sharing genomic data, comparing data commons, data lakes, and data ecosystems, highlighting their features, challenges, and use cases in biomedical research.
Contribution
It provides a comprehensive comparison of data commons, data lakes, and data ecosystems for genomic data management and analysis, emphasizing their roles and differences.
Findings
Data commons integrate data with cloud infrastructure for analysis and sharing.
Data lakes offer access to data with deferred curation and analysis.
Interoperability of multiple data commons enables large-scale data ecosystems.
Abstract
Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interoperating multiple data commons. It can be quite labor intensive to curate, import and analyze the data in a data commons. Data lakes provide an alternative to data commons and simply provide access to data, with the data curation and analysis deferred until later and delegated to those that access the data. We review software platforms for managing, analyzing and sharing genomic data, with an emphasis on data commons, but also covering data ecosystems and data lakes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Scientific Computing and Data Management · Data Quality and Management
