High-Dimensional Multi-Study Robust Factor Model for Analyzing RNA Sequencing Data from Heterogeneous Sources

Xiaolu Jiang; Wei Liu

arXiv:2506.18478·stat.AP·June 24, 2025

High-Dimensional Multi-Study Robust Factor Model for Analyzing RNA Sequencing Data from Heterogeneous Sources

Xiaolu Jiang, Wei Liu

PDF

TL;DR

This paper introduces MultiRFM, a robust factor model tailored for high-dimensional, multi-source RNA sequencing data, effectively handling heterogeneity and technical noise to improve feature extraction and downstream analysis.

Contribution

The paper presents a novel multi-study robust factor model that accounts for heterogeneity and heavy-tailed errors, with a new variational estimation method and a step-wise singular value ratio for tuning.

Findings

01

MultiRFM outperforms existing models in simulation accuracy.

02

It demonstrates superior real-world data fitting and prediction.

03

The method is computationally efficient for large-scale data.

Abstract

The amount of high-dimensional large-scale RNA sequencing data derived from multiple heterogeneous sources has increased exponentially in biological science. During data collection, significant technical noise or errors may occur. To robustly extract meaningful features from this type of data, we introduce a high-dimensional multi-study robust factor model, called MultiRFM, which learns latent features and accounts for the heterogeneity among sources. MultiRFM demonstrates significantly greater robustness compared to existing multi-study factor models and is capable of estimating study-specific factors that are overlooked by single-study robust factor models. Specifically,we utilize a multivariate t-distribution to model errors, capturing potential heavy tails, and incorporate both study-shared and study-specified factors to represent common and specific information among studies. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.