Khatri-Rao Clustering for Data Summarization
Martino Ciaperoni, Collin Leiber, Aristides Gionis, Heikki Mannila

TL;DR
This paper introduces the Khatri-Rao clustering paradigm to produce more succinct and accurate data summaries by extending centroid-based clustering methods, including k-Means and deep clustering, through interaction of protocentroids.
Contribution
It proposes the Khatri-Rao clustering framework and algorithms, enhancing data summarization by reducing redundancy and improving trade-offs between succinctness and accuracy.
Findings
Khatri-Rao k-Means outperforms standard k-Means in data summarization.
Deep clustering with Khatri-Rao further reduces summary size while maintaining accuracy.
Experiments demonstrate improved trade-offs in data summarization quality.
Abstract
As datasets continue to grow in size and complexity, finding succinct yet accurate data summaries poses a key challenge. Centroid-based clustering, a widely adopted approach to address this challenge, finds informative summaries of datasets in terms of few prototypes, each representing a cluster in the data. Despite their wide adoption, the resulting data summaries often contain redundancies, limiting their effectiveness particularly in datasets characterized by a large number of underlying clusters. To overcome this limitation, we introduce the Khatri-Rao clustering paradigm that extends traditional centroid-based clustering to produce more succinct but equally accurate data summaries by postulating that centroids arise from the interaction of two or more succinct sets of protocentroids. We study two central approaches to centroid-based clustering, namely the well-established k-Means…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Quality and Management · Time Series Analysis and Forecasting
