Data Stream Clustering: A Review
Alaettin Zubaro\u{g}lu, Volkan Atalay

TL;DR
This paper reviews data stream clustering techniques, discussing concepts, challenges, algorithms, and tools, providing a comprehensive overview of current methods and open problems in real-time data stream analysis.
Contribution
It offers a detailed survey of recent data stream clustering algorithms, analyzing their techniques, complexities, and accuracies, and highlights open challenges and datasets.
Findings
Comparison of clustering algorithms based on technique and accuracy
Identification of open problems in data stream clustering
Overview of tools and datasets for data stream analysis
Abstract
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for real-time data stream processing, because it can be applied with less prior information about the data and it does not need labeled instances. However, data stream clustering differs from traditional clustering in many aspects and it has several challenging issues. Here, we provide information regarding the concepts and common characteristics of data streams, such as concept drift, data structures for data streams, time window models and outlier detection. We comprehensively review recent data stream clustering algorithms and analyze them in terms of the base clustering technique, computational complexity and clustering accuracy. A comparison of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
