Regularized k-POD: Sparse k-means clustering for high-dimensional missing data
Xin Guan, Yoshikazu Terada

TL;DR
This paper introduces a regularized k-POD method that enhances high-dimensional missing data clustering by reducing bias through feature-wise regularization, maintaining efficiency and flexibility.
Contribution
It proposes the first bias-mitigating regularized k-POD approach for high-dimensional missing data clustering, improving accuracy over existing methods.
Findings
Effectively reduces bias in high-dimensional missing data clustering.
Improves clustering accuracy in simulations and real-world data.
Maintains computational efficiency and flexibility.
Abstract
The classical k-means clustering, based on distances computed from all data features, cannot be directly applied to incomplete data with missing values. A natural extension of k-means to missing data, namely k-POD, uses only the observed entries for clustering and is both computationally efficient and flexible. However, for high-dimensional missing data including features irrelevant to the underlying cluster structure, the presence of such irrelevant features leads to the bias of k-POD in estimating cluster centers, thereby damaging its clustering effect. Nevertheless, the existing k-POD method performs well in low-dimensional cases, highlighting the importance of addressing the bias issue. To this end, in this paper, we propose a regularized k-POD clustering method that applies feature-wise regularization on cluster centers into the existing k-POD clustering. Such a penalty on cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Clustering Algorithms Research · Bayesian Methods and Mixture Models
