Grouping effects of sparse CCA models in variable selection
Kefei Liu, Qi Long, Li Shen

TL;DR
This paper investigates the grouping effects of standard and simplified sparse canonical correlation analysis models in variable selection, revealing that the simplified model tends to select entire groups of correlated variables together, unlike the standard model.
Contribution
The paper provides a theoretical analysis of the grouping behavior of both SCCA models in high-dimensional variable selection, supported by empirical validation.
Findings
Simplified SCCA selects entire variable groups together.
Standard SCCA randomly selects dominant variables within groups.
Empirical results confirm theoretical predictions.
Abstract
The sparse canonical correlation analysis (SCCA) is a bi-multivariate association model that finds sparse linear combinations of two sets of variables that are maximally correlated with each other. In addition to the standard SCCA model, a simplified SCCA criterion which maixmizes the cross-covariance between a pair of canonical variables instead of their cross-correlation, is widely used in the literature due to its computational simplicity. However, the behaviors/properties of the solutions of these two models remain unknown in theory. In this paper, we analyze the grouping effect of the standard and simplified SCCA models in variable selection. In high-dimensional settings, the variables often form groups with high within-group correlation and low between-group correlation. Our theoretical analysis shows that for grouped variable selection, the simplified SCCA jointly selects or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals · Bioinformatics and Genomic Networks
