Clustering with Missing Features: A Penalized Dissimilarity Measure based approach
Shounak Datta, Supritam Bhattacharjee, Swagatam Das

TL;DR
This paper introduces a novel dissimilarity measure called FWPD that enables direct clustering of incomplete datasets without imputation, improving accuracy over traditional methods.
Contribution
It proposes the FWPD measure and modifies k-means and hierarchical clustering algorithms to handle missing features directly, with theoretical convergence analysis and extensive experiments.
Findings
FWPD-based clustering outperforms imputation methods on benchmark datasets.
The modified algorithms converge to local optima within finite iterations.
The approach effectively handles various missingness types, improving clustering quality.
Abstract
Many real-world clustering problems are plagued by incomplete data characterized by missing or absent features for some or all of the data instances. Traditional clustering methods cannot be directly applied to such data without preprocessing by imputation or marginalization techniques. In this article, we overcome this drawback by utilizing a penalized dissimilarity measure which we refer to as the Feature Weighted Penalty based Dissimilarity (FWPD). Using the FWPD measure, we modify the traditional k-means clustering algorithm and the standard hierarchical agglomerative clustering algorithms so as to make them directly applicable to datasets with missing features. We present time complexity analyses for these new techniques and also undertake a detailed theoretical analysis showing that the new FWPD based k-means algorithm converges to a local optimum within a finite number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
