Rk-means: Fast Clustering for Relational Data
Ryan Curtin, Ben Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu,, Maximilian Schleich

TL;DR
Rk-means is a novel clustering algorithm for relational data that avoids expensive feature extraction by constructing a small grid coreset, achieving significant speedups over traditional methods.
Contribution
The paper introduces Rk-means, a fast clustering algorithm that operates directly on relational data without full data matrix computation, leveraging a grid coreset for efficiency.
Findings
Achieves constant approximation for k-means objective
Provides orders-of-magnitude speedup in empirical tests
Runs faster than data matrix computation on the database
Abstract
Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be prohibitive, as the data matrix may be (much) larger than the total input relation size. This paper introduces Rk-means, or relational k -means algorithm, for clustering relational data tuples without having to access the full data matrix. As such, we avoid having to run the expensive feature extraction query and storing its output. Our algorithm leverages the underlying structures in relational data. It involves construction of a small {\it grid coreset} of the data matrix for subsequent cluster construction. This gives a constant approximation for the k -means objective, while having asymptotic runtime improvements over standard approaches of first running…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Face and Expression Recognition
