Large-Scale Clustering Based on Data Compression
Xudong Ma

TL;DR
This paper introduces a scalable distributed clustering method based on data compression and optimization, utilizing Dantzig-Wolfe decomposition to handle large non-convex data sets efficiently.
Contribution
It reformulates the clustering problem as an optimization task and demonstrates the applicability of Dantzig-Wolfe decomposition to non-convex problems as data size grows.
Findings
Effective clustering on large data sets
Minimal data communication between nodes
Convergence of duality gap to zero in large-scale limit
Abstract
This paper considers the clustering problem for large data sets. We propose an approach based on distributed optimization. The clustering problem is formulated as an optimization problem of maximizing the classification gain. We show that the optimization problem can be reformulated and decomposed into small-scale sub optimization problems by using the Dantzig-Wolfe decomposition method. Generally speaking, the Dantzig-Wolfe method can only be used for convex optimization problems, where the duality gaps are zero. Even though, the considered optimization problem in this paper is non-convex, we prove that the duality gap goes to zero, as the problem size goes to infinity. Therefore, the Dantzig-Wolfe method can be applied here. In the proposed approach, the clustering problem is iteratively solved by a group of computers coordinated by one center processor, where each computer solves one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Advanced Clustering Algorithms Research
