$k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
Chenglin Fan, Ping Li, Xiaoyun Li

TL;DR
This paper introduces a new metric embedding tree-based initialization method for $k$-median clustering that improves initial center quality, extends to differential privacy, and enhances approximation error bounds.
Contribution
We propose the HST initialization scheme for $k$-median in metric spaces, which outperforms existing methods and can be adapted for differential privacy.
Findings
HST initialization achieves lower error than $k$-median++
The method extends to differential privacy with improved error bounds
Experimental results validate theoretical advantages
Abstract
When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the -median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, -median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · HIV, Drug Use, Sexual Risk · Bayesian Methods and Mixture Models
