Achieving Approximate Soft Clustering in Data Streams

Vaneet Aggarwal; Shankar Krishnan

arXiv:1207.6199·cs.DS·March 20, 2015·5 cites

Achieving Approximate Soft Clustering in Data Streams

Vaneet Aggarwal, Shankar Krishnan

PDF

Open Access

TL;DR

This paper introduces a novel one-pass streaming algorithm for approximate soft clustering that efficiently processes data streams and adapts to evolving data, with applications in density estimation and mixture models.

Contribution

It presents the first streaming soft clustering algorithm based on a pseudo-approximation of the k-means objective, extending to moving window scenarios.

Findings

01

Achieves a pseudo-approximation to soft clustering in data streams.

02

Extends the algorithm to handle moving window clustering.

03

Utilizes an extension of the k-means++ algorithm for streaming data.

Abstract

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionally, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. In this paper, we propose a one-pass streaming soft clustering (membership of a point in a cluster is described by a distribution) algorithm which approximates the "soft" version of the k-means objective function. Soft clustering has applications in various aspects of databases and machine learning including density estimation and learning mixture models. We first achieve a simple pseudo-approximation in terms of the "hard" k-means algorithm, where the algorithm is allowed to output…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Bayesian Methods and Mixture Models · Data Stream Mining Techniques