Astrogenomics: big data, old problems, old solutions?
Aaron Golden, S. George Djorgovski, John M. Greally

TL;DR
This paper discusses the challenges of managing and interpreting the vast genomic data in modern biology, emphasizing the need for integrated approaches to understand gene function and personalized medicine, and suggests that astronomical data management strategies might offer solutions.
Contribution
It highlights the persistent big data challenges in genomics and proposes exploring solutions inspired by astronomical data handling to improve data integration and analysis.
Findings
Genomic data deluge surpasses previous challenges.
Integration of sequence, transcriptional, and epigenetic data is essential.
Astronomical data management strategies could inform genomics.
Abstract
The ominous warnings of a `data deluge' in the life sciences from high-throughput DNA sequencing data are being supplanted by a second deluge, of cliches bemoaning our collective scientific fate unless we address the genomic data `tsunami'. It is imperative that we explore the many facets of the genome, not just sequence but also transcriptional and epigenetic variability, integrating these observations in order to attain a genuine understanding of how genes function, towards a goal of genomics-based personalized medicine. Determining any individual's genomic properties requires comparison to many others, sifting out the specific from the trends, requiring access to the many in order to yield information relevant to the few. This is the central big data challenge in genomics that still requires some sort of resolution. Is there a practical, feasible way of directly connecting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Genomics and Phylogenetic Studies · Research Data Management Practices
