Estimation of the number of spikes using a generalized spike population model and application to RNA-seq data
Hyo Young Choi, J. S. Marron

TL;DR
This paper introduces a new algorithm for estimating the number of spikes in data using a generalized spike population model, specifically applied to RNA-seq data, addressing limitations of existing methods.
Contribution
It proposes a novel spike estimation algorithm and a new noise model tailored for RNA-seq data, improving biological relevance and accuracy.
Findings
The new algorithm provides more accurate spike number estimates in RNA-seq data.
The proposed noise model yields biologically reasonable spike counts.
A graphical tool helps evaluate the noise model's performance.
Abstract
Although a generalized spike population model has been actively studied in random matrix theory, its application to real data has been rarely explored. We find that most methods for determining the number of spikes based on the Johnstone's spike population model choose far too many spikes in RNA-seq gene expression data or often fail to determine the number of spikes by indicating that all components are spikes. In this paper, we propose a new algorithm for the estimation of the number of spikes based on a generalized spike population model. Also, we suggest a new noise model for RNA-seq data based on population spectral distribution ideas, which provides a biologically reasonable number of spikes using the proposed algorithm. Furthermore, we propose a graphical tool for assessing the performance of the underlying noise model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRandom Matrices and Applications · Bayesian Methods and Mixture Models · Stochastic processes and statistical mechanics
