Rk-means: Fast Clustering for Relational Data

Ryan Curtin; Ben Moseley; Hung Q. Ngo; XuanLong Nguyen; Dan Olteanu,; Maximilian Schleich

arXiv:1910.04939·cs.LG·October 14, 2019·5 cites

Rk-means: Fast Clustering for Relational Data

Ryan Curtin, Ben Moseley, Hung Q. Ngo, XuanLong Nguyen, Dan Olteanu,, Maximilian Schleich

PDF

Open Access

TL;DR

Rk-means is a novel clustering algorithm for relational data that avoids expensive feature extraction by constructing a small grid coreset, achieving significant speedups over traditional methods.

Contribution

The paper introduces Rk-means, a fast clustering algorithm that operates directly on relational data without full data matrix computation, leveraging a grid coreset for efficiency.

Findings

01

Achieves constant approximation for k-means objective

02

Provides orders-of-magnitude speedup in empirical tests

03

Runs faster than data matrix computation on the database

Abstract

Conventional machine learning algorithms cannot be applied until a data matrix is available to process. When the data matrix needs to be obtained from a relational database via a feature extraction query, the computation cost can be prohibitive, as the data matrix may be (much) larger than the total input relation size. This paper introduces Rk-means, or relational k -means algorithm, for clustering relational data tuples without having to access the full data matrix. As such, we avoid having to run the expensive feature extraction query and storing its output. Our algorithm leverages the underlying structures in relational data. It involves construction of a small {\it grid coreset} of the data matrix for subsequent cluster construction. This gives a constant approximation for the k -means objective, while having asymptotic runtime improvements over standard approaches of first running…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Advanced Clustering Algorithms Research · Face and Expression Recognition