Scalable Genomics with R and Bioconductor
Michael Lawrence, Martin Morgan

TL;DR
This paper reviews strategies for scalable analysis of large genomic datasets using R and Bioconductor, demonstrating their application in detecting genetic variants from whole genome sequencing data.
Contribution
It introduces practical strategies and their implementation in R packages for scalable genomic data analysis, focusing on processing, summarization, and visualization.
Findings
Effective use of restrictive queries, compression, iteration, and parallel computing.
Application of Bioconductor packages to whole genome sequencing data.
Demonstration of scalable genomic data analysis techniques.
Abstract
This paper reviews strategies for solving problems encountered when analyzing large genomic data sets and describes the implementation of those strategies in R by packages from the Bioconductor project. We treat the scalable processing, summarization and visualization of big genomic data. The general ideas are well established and include restrictive queries, compression, iteration and parallel computing. We demonstrate the strategies by applying Bioconductor packages to the detection and analysis of genetic variants from a whole genome sequencing experiment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
