High-Dimensional Overdispersed Generalized Factor Model with Application to Single-Cell Sequencing Data Analysis
Jinyu Nie, Zhilong Qin, Wei Liu

TL;DR
This paper introduces OverGFM, a novel high-dimensional nonlinear factor model tailored for overdispersed mixed-type data, with a new variational EM algorithm and a criterion for selecting the number of factors, validated through simulations and genomics data.
Contribution
The paper develops OverGFM, a new overdispersed generalized factor model with an efficient variational EM algorithm for high-dimensional mixed data analysis.
Findings
OverGFM outperforms existing methods in accuracy and efficiency.
The proposed criterion effectively determines the number of factors.
Application to genomics data demonstrates practical utility.
Abstract
The current high-dimensional linear factor models fail to account for the different types of variables, while high-dimensional nonlinear factor models often overlook the overdispersion present in mixed-type data. However, overdispersion is prevalent in practical applications, particularly in fields like biomedical and genomics studies. To address this practical demand, we propose an overdispersed generalized factor model (OverGFM) for performing high-dimensional nonlinear factor analysis on overdispersed mixed-type data. Our approach incorporates an additional error term to capture the overdispersion that cannot be accounted for by factors alone. However, this introduces significant computational challenges due to the involvement of two high-dimensional latent random matrices in the nonlinear model. To overcome these challenges, we propose a novel variational EM algorithm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Cancer Genomics and Diagnostics
