Principal variables analysis for non-Gaussian data
Dylan Clark-Boucher, Jeffrey W. Miller

TL;DR
This paper extends principal variables analysis to non-Gaussian data by incorporating alternative correlation measures like Spearman, copula, and polychoric, improving variable selection accuracy across various data types.
Contribution
It introduces a generalized PVA framework that utilizes different correlation types, enhancing performance for non-Gaussian and ordinal data compared to traditional Pearson-based methods.
Findings
Gaussian copula and Spearman correlations improve PVA on continuous non-Gaussian data.
Polychoric correlations outperform others on ordinal data.
Different correlation choices lead to substantially different principal variable sets.
Abstract
Principal variables analysis (PVA) is a technique for selecting a subset of variables that capture as much of the information in a dataset as possible. Existing approaches for PVA are based on the Pearson correlation matrix, which is not well-suited to describing the relationships between non-Gaussian variables. We propose a generalized approach to PVA enabling the use of different types of correlation, and we explore using Spearman, Gaussian copula, and polychoric correlations as alternatives to Pearson correlation when performing PVA. We compare performance in simulation studies varying the form of the true multivariate distribution over a wide range of possibilities. Our results show that on continuous non-Gaussian data, using generalized PVA with Gaussian copula or Spearman correlations provides a major improvement in performance compared to Pearson. Meanwhile, on ordinal data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParkinson's Disease Mechanisms and Treatments · Sensory Analysis and Statistical Methods
