A Unified Statistical Framework for Single Cell and Bulk RNA Sequencing Data
Lingxue Zhu, Jing Lei, Bernie Devlin, Kathryn Roeder

TL;DR
This paper introduces a hierarchical statistical model that integrates single cell and bulk RNA sequencing data, effectively addressing dropout noise and improving gene expression estimation across cell types.
Contribution
The paper presents URSM, a unified framework that models both data types and dropout events, enhancing accuracy in gene expression analysis and cell type deconvolution.
Findings
URSM outperforms existing methods in dropout correction.
URSM accurately estimates cell type proportions in bulk data.
Application to fetal brain data reveals meaningful cell-specific expression patterns.
Abstract
Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the "dropout" events. A "dropout" happens when the RNA for a gene fails to be amplified prior to sequencing, producing a "false" zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
