Feature screening for clustering analysis
Changhu Wang, Zihao Chen, Ruibin Xi

TL;DR
This paper introduces a feature screening method for ultrahigh dimensional clustering that evaluates the homogeneity of feature mixture distributions using the EM-test, improving clustering accuracy and efficiency.
Contribution
It proposes a novel screening procedure based on the EM-test to identify cluster-relevant features in high-dimensional data, with theoretical guarantees and practical validation.
Findings
The method accurately screens important features in simulations.
It achieves sure screening and selection consistency.
Significantly improves clustering performance in real data.
Abstract
In this paper, we consider feature screening for ultrahigh dimensional clustering analyses. Based on the observation that the marginal distribution of any given feature is a mixture of its conditional distributions in different clusters, we propose to screen clustering features by independently evaluating the homogeneity of each feature's mixture distribution. Important cluster-relevant features have heterogeneous components in their mixture distributions and unimportant features have homogeneous components. The well-known EM-test statistic is used to evaluate the homogeneity. Under general parametric settings, we establish the tail probability bounds of the EM-test statistic for the homogeneous and heterogeneous features, and further show that the proposed screening procedure can achieve the sure independent screening and even the consistency in selection properties. Limiting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Advanced Clustering Algorithms Research · Gene expression and cancer classification
