Statistical Modeling of RNA-Seq Data

Julia Salzman; Hui Jiang; Wing Hung Wong

arXiv:1106.3211·stat.ME·June 17, 2011

Statistical Modeling of RNA-Seq Data

Julia Salzman, Hui Jiang, Wing Hung Wong

PDF

TL;DR

This paper presents a statistical model for estimating isoform abundance from RNA-Seq data, accommodating bias and data type, with evidence that paired end sequencing yields more accurate results than single end.

Contribution

It introduces a flexible statistical model for RNA-Seq data analysis, including a maximum likelihood estimator and insights into data type advantages.

Findings

01

Paired end RNA-Seq provides more accurate isoform estimates than single end.

02

The model accounts for sampling bias along transcript length.

03

Simulation studies validate the model's effectiveness.

Abstract

Recently, ultra high-throughput sequencing of RNA (RNA-Seq) has been developed as an approach for analysis of gene expression. By obtaining tens or even hundreds of millions of reads of transcribed sequences, an RNA-Seq experiment can offer a comprehensive survey of the population of genes (transcripts) in any sample of interest. This paper introduces a statistical model for estimating isoform abundance from RNA-Seq data and is flexible enough to accommodate both single end and paired end RNA-Seq data and sampling bias along the length of the transcript. Based on the derivation of minimal sufficient statistics for the model, a computationally feasible implementation of the maximum likelihood estimator of the model is provided. Further, it is shown that using paired end RNA-Seq provides more accurate isoform abundance estimates than single end sequencing at fixed sequencing depth.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.