$k$-POD: A Method for $k$-Means Clustering of Missing Data
Jocelyn T. Chi, Eric C. Chi, Richard G. Baraniuk

TL;DR
The paper introduces $k$-POD, an extension of $k$-means clustering designed to handle missing data effectively without requiring data imputation or external information, even with high missingness.
Contribution
The novel $k$-POD method enables $k$-means clustering directly on incomplete data, addressing limitations of existing imputation-based approaches.
Findings
Works with unknown missingness mechanisms
Effective with high levels of missing data
Avoids costly data imputation
Abstract
The -means algorithm is often used in clustering applications but its usage requires a complete data matrix. Missing data, however, is common in many applications. Mainstream approaches to clustering missing data reduce the missing data problem to a complete data formulation through either deletion or imputation but these solutions may incur significant costs. Our -POD method presents a simple extension of -means clustering for missing data that works even when the missingness mechanism is unknown, when external information is unavailable, and when there is significant missingness in the data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
