# Streaming k-Means Clustering with Fast Queries

**Authors:** Yu Zhang, Kanat Tangwongsan, Srikanta Tirthapura

arXiv: 1701.03826 · 2018-12-10

## TL;DR

This paper introduces a streaming k-means clustering method that significantly accelerates query responses by reusing data summaries, achieving fast, accurate clustering with low resource consumption.

## Contribution

The paper proposes a novel coreset caching technique that improves query speed in streaming k-means clustering while maintaining provable accuracy and low space complexity.

## Key findings

- Substantial reduction in query time compared to previous methods
- Maintains small approximation error in clustering results
- Demonstrates efficiency through theoretical analysis and experiments

## Abstract

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster centers while retaining the desirable properties of provably small approximation error and low space usage. Our algorithms rely on a novel idea of "coreset caching" that systematically reuses coresets (summaries of data) computed for recent queries in answering the current clustering query. We present both theoretical analysis and detailed experiments demonstrating their correctness and efficiency

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.03826/full.md

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/1701.03826/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1701.03826/full.md

---
Source: https://tomesphere.com/paper/1701.03826