Generalized simultaneous component analysis of binary and quantitative data
Yipeng Song, Johan A. Westerhuis, Nanne Aben, Lodewyk F.A. Wessels,, Patrick J.F. Groenen, Age K. Smilde

TL;DR
This paper introduces the generalized simultaneous component analysis (GSCA) model for integrating binary and quantitative data, addressing overfitting with a concave nuclear norm penalty, and demonstrates its effectiveness through simulations and real gene data analysis.
Contribution
The paper develops a novel GSCA model that combines binary and quantitative data analysis within a maximum likelihood framework, incorporating a concave penalty to prevent overfitting.
Findings
GSCA effectively recovers underlying data structures in simulations.
The model performs well with low signal-to-noise ratios.
Application to gene data demonstrates practical utility.
Abstract
In the current era of systems biological research there is a need for the integrative analysis of binary and quantitative genomics data sets measured on the same objects. One standard tool of exploring the underlying dependence structure present in multiple quantitative data sets is simultaneous component analysis (SCA) model. However, it does not have any provisions when a part of the data are binary. To this end, we propose the generalized SCA (GSCA) model, which takes into account the distinct mathematical properties of binary and quantitative measurements in the maximum likelihood framework. Like in the SCA model, a common low dimensional subspace is assumed to represent the shared information between these two distinct types of measurements. However, the GSCA model can easily be overfitted when a rank larger than one is used, leading to some of the estimated parameters to become…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genomic variations and chromosomal abnormalities · Genomics and Chromatin Dynamics
