Big Data Technology Accelerate Genomics Precision Medicine
Hao Li

TL;DR
This paper discusses how Intel's big data technologies, including GenomicsDB and Lustre filesystem, accelerate genomics research by improving data storage and analysis, demonstrated through real-world validation in China.
Contribution
It introduces a scalable architecture leveraging Intel's big data tools for efficient genomics data management and analysis, validated in real-world research institutions.
Findings
Validated architecture in BGI China and Shanghai Children Hospital
Enhanced data query and storage performance for genomics
Scalable solution applicable to global genomics research
Abstract
During genomics life science research, the data volume of whole genomics and life science algorithm is going bigger and bigger, which is calculated as TB, PB or EB etc. The key problem will be how to store and analyze the data with optimized way. This paper demonstrates how Intel Big Data Technology and Architecture help to facilitate and accelerate the genomics life science research in data store and utilization. Intel defines high performance GenomicsDB for variant call data query and Lustre filesystem with Hierarchal Storage Management for genomics data store. Based on these great technology, Intel defines genomics knowledge share and exchange architecture, which is landed and validated in BGI China and Shanghai Children Hospital with very positive feedback. And these big data technology can definitely be scaled to much more genomics life science partners in the world.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Artificial Intelligence in Healthcare · Genetics, Bioinformatics, and Biomedical Research
