Scaling up for high dimensional and high speed data streams: HSDStream
Irshad Ahmed, Irfan Ahmed, Waseem Shahzad

TL;DR
HSDStream is a new high-speed clustering algorithm designed for high-dimensional data streams, using projected subspaces and exponential moving averages to improve efficiency and performance in applications like intrusion detection.
Contribution
The paper introduces HSDStream, a novel clustering scheme that enhances speed and memory efficiency for high-dimensional data streams using projected subspaces and exponential moving averages.
Findings
HSDStream outperforms HDDStream in cluster purity, memory usage, and processing speed.
Experimental results on KDD dataset validate the effectiveness of HSDStream.
HSDStream is suitable for real-time applications like network monitoring and intrusion detection.
Abstract
This paper presents a novel high speed clustering scheme for high dimensional data streams. Data stream clustering has gained importance in different applications, for example, in network monitoring, intrusion detection, and real-time sensing are few of those. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high dimensionality make it non-trivial. In order to tackle this problem, projected subspace within the high dimensions and limited window sized data per unit of time are used for clustering purpose. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream) which employs exponential moving averages to reduce the size of the memory and speed up the processing of projected subspace data stream. The proposed algorithm has been tested against HDDStream for cluster purity, memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Complex Network Analysis Techniques
