Pooled variable scaling for cluster analysis
Jakob Raymaekers, Ruben H. Zamar

TL;DR
This paper introduces a novel pooled variance-based scaling method for cluster analysis that preserves the influence of informative variables, validated through simulations and real genomic data applications.
Contribution
The paper presents a new scaling approach based on pooled variance that improves clustering performance by maintaining the effect of informative variables.
Findings
The proposed scaling method outperforms traditional methods in simulations.
It is effective in high-dimensional genomic data clustering.
The method is safe and generally useful across various datasets.
Abstract
We propose a new approach for scaling prior to cluster analysis based on the concept of pooled variance. Unlike available scaling procedures such as the standard deviation and the range, our proposed scale avoids dampening the beneficial effect of informative clustering variables. We confirm through an extensive simulation study and applications to well known real data examples that the proposed scaling method is safe and generally useful. Finally, we use our approach to cluster a high dimensional genomic dataset consisting of gene expression data for several specimens of breast cancer cells tissue.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
