High-Dimensional Multi-Study Robust Factor Model for Analyzing RNA Sequencing Data from Heterogeneous Sources
Xiaolu Jiang, Wei Liu

TL;DR
This paper introduces MultiRFM, a robust factor model tailored for high-dimensional, multi-source RNA sequencing data, effectively handling heterogeneity and technical noise to improve feature extraction and downstream analysis.
Contribution
The paper presents a novel multi-study robust factor model that accounts for heterogeneity and heavy-tailed errors, with a new variational estimation method and a step-wise singular value ratio for tuning.
Findings
MultiRFM outperforms existing models in simulation accuracy.
It demonstrates superior real-world data fitting and prediction.
The method is computationally efficient for large-scale data.
Abstract
The amount of high-dimensional large-scale RNA sequencing data derived from multiple heterogeneous sources has increased exponentially in biological science. During data collection, significant technical noise or errors may occur. To robustly extract meaningful features from this type of data, we introduce a high-dimensional multi-study robust factor model, called MultiRFM, which learns latent features and accounts for the heterogeneity among sources. MultiRFM demonstrates significantly greater robustness compared to existing multi-study factor models and is capable of estimating study-specific factors that are overlooked by single-study robust factor models. Specifically,we utilize a multivariate t-distribution to model errors, capturing potential heavy tails, and incorporate both study-shared and study-specified factors to represent common and specific information among studies. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
