Unsupervised Variable Selection for Ultrahigh-Dimensional Clustering Analysis
Tonglin Zhang, Huyunting Huang

TL;DR
This paper introduces an unsupervised variable selection method, FPCFL, that improves clustering performance in ultrahigh-dimensional data by effectively distinguishing informative variables from uninformative ones.
Contribution
The paper proposes the FPCFL method for unsupervised variable selection, capable of identifying active, redundant, and uninformative variables, enhancing clustering accuracy.
Findings
FPCFL outperforms existing methods in simulations.
Excluding uninformative variables improves clustering results.
Selecting a relevant subset can match full-variable clustering performance.
Abstract
Compared to supervised variable selection, the research on unsupervised variable selection is far behind. A forward partial-variable clustering full-variable loss (FPCFL) method is proposed for the corresponding challenges. An advantage is that the FPCFL method can distinguish active, redundant, and uninformative variables, which the previous methods cannot achieve. Theoretical and simulation studies show that the performance of a clustering method using all the variables can be worse if many uninformative variables are involved. Better results are expected if the uninformative variables are excluded. The research addresses a previous concern about how variable selection affects the performance of clustering. Rather than many previous methods attempting to select all the relevant variables, the proposed method selects a subset that can induce an equally good result. This phenomenon does…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition
