Covariance and PCA for Categorical Variables
Hirotaka Niitsuma, Takashi Okada

TL;DR
This paper introduces a novel covariance measure for categorical variables based on regular simplex expressions and proposes a PCA method (RS-PCA) that enhances interpretability and aids variable selection in categorical datasets.
Contribution
It presents a new covariance definition for categorical data and a PCA method using regular simplex expressions, improving interpretability and variable selection.
Findings
Reasonable covariance values for test data
RS-PCA enhances interpretability of principal components
Effective variable selection criterion for categorical data
Abstract
Covariances from categorical variables are defined using a regular simplex expression for categories. The method follows the variance definition by Gini, and it gives the covariance as a solution of simultaneous equations. The calculated results give reasonable values for test data. A method of principal component analysis (RS-PCA) is also proposed using regular simplex expressions, which allows easy interpretation of the principal components. The proposed methods apply to variable selection problem of categorical data USCensus1990 data. The proposed methods give appropriate criterion for the variable selection problem of categorical
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpectroscopy and Chemometric Analyses · Advanced Statistical Methods and Models
