Communication-Efficient and Exact Clustering Distributed Streaming Data
Dang-Hoan Tran

TL;DR
This paper introduces a distributed clustering framework for streaming data that maintains local micro-clusters at remote sites and efficiently generates global clusters at a central coordinator, ensuring high scalability and communication efficiency.
Contribution
The paper presents a novel distributed streaming clustering framework with a coordinator and remote sites, improving scalability and communication efficiency while maintaining clustering quality.
Findings
Global clustering closely matches centralized results.
Framework achieves high scalability and communication efficiency.
Theoretical and empirical validation supports effectiveness.
Abstract
A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering streaming data. Our proposed framework consists of fundamen- tal processes: one coordinator-site process and many remote-site processes. Remote-site processes can directly communicate with the coordinator-process but cannot communicate the other remote site processes. Every remote-site process generates and maintains micro- clusters that represent cluster information summary, from its local data stream. Remote sites send the local micro-clusterings to the coordinator by the serialization technique, or the coordinator invokes the remote methods in order to get the local micro-clusterings from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Clustering Algorithms Research · Data Stream Mining Techniques · Caching and Content Delivery
