Scalable Genomics with R and Bioconductor

Michael Lawrence; Martin Morgan

arXiv:1409.2864·q-bio.GN·September 11, 2014

Scalable Genomics with R and Bioconductor

Michael Lawrence, Martin Morgan

PDF

TL;DR

This paper reviews strategies for scalable analysis of large genomic datasets using R and Bioconductor, demonstrating their application in detecting genetic variants from whole genome sequencing data.

Contribution

It introduces practical strategies and their implementation in R packages for scalable genomic data analysis, focusing on processing, summarization, and visualization.

Findings

01

Effective use of restrictive queries, compression, iteration, and parallel computing.

02

Application of Bioconductor packages to whole genome sequencing data.

03

Demonstration of scalable genomic data analysis techniques.

Abstract

This paper reviews strategies for solving problems encountered when analyzing large genomic data sets and describes the implementation of those strategies in R by packages from the Bioconductor project. We treat the scalable processing, summarization and visualization of big genomic data. The general ideas are well established and include restrictive queries, compression, iteration and parallel computing. We demonstrate the strategies by applying Bioconductor packages to the detection and analysis of genetic variants from a whole genome sequencing experiment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.