ClustOfVar: An R Package for the Clustering of Variables
M. Chavent, V. Kuentz, B. Liquet, L. Saracco

TL;DR
ClustOfVar is an R package that facilitates clustering of mixed quantitative and qualitative variables using correlation-based criteria and principal component analysis, aiding dimension reduction and variable selection.
Contribution
It introduces new algorithms and a bootstrap method for optimal cluster determination in mixed variable datasets, filling a gap in existing clustering tools.
Findings
Effective clustering of mixed variables demonstrated on small datasets
Bootstrap approach helps identify appropriate number of clusters
Algorithms outperform traditional methods in mixed data scenarios
Abstract
Clustering of variables is as a way to arrange variables into homogeneous clusters, i.e., groups of variables which are strongly related to each other and thus bring the same information. These approaches can then be useful for dimension reduction and variable selection. Several specific methods have been developed for the clustering of numerical variables. However concerning qualitative variables or mixtures of quantitative and qualitative variables, far fewer methods have been proposed. The R package ClustOfVar was specifically developed for this purpose. The homogeneity criterion of a cluster is defined as the sum of correlation ratios (for qualitative variables) and squared correlations (for quantitative variables) to a synthetic quantitative variable, summarizing "as good as possible" the variables in the cluster. This synthetic variable is the first principal component obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research
