Gene expression modelling across multiple cell-lines with MapReduce
David M. Budden, Edmund J. Crampin

TL;DR
This paper presents a scalable MapReduce-based method for gene expression modeling that efficiently analyzes large multi-cell line datasets, revealing invariant relationships between histone modifications and transcription.
Contribution
It introduces a parallelized MapReduce framework for gene expression modeling, enabling analysis of large datasets across multiple cell lines and demonstrating its effectiveness.
Findings
Histone modifications and transcription relationships are lineage, tissue, and karyotype-invariant.
Models trained on non-cancerous data can predict cancerous gene expression accurately.
The approach scales efficiently to large, multi-cell line datasets.
Abstract
With the wealth of high-throughput sequencing data generated by recent large-scale consortia, predictive gene expression modelling has become an important tool for integrative analysis of transcriptomic and epigenetic data. However, sequencing data-sets are characteristically large, and previously modelling frameworks are typically inefficient and unable to leverage multi-core or distributed processing architectures. In this study, we detail an efficient and parallelised MapReduce implementation of gene expression modelling. We leverage the computational efficiency of this framework to provide an integrative analysis of over fifty histone modification data-sets across a variety of cancerous and non-cancerous cell-lines. Our results demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Genomics and Phylogenetic Studies · Epigenetics and DNA Methylation
