Online Top-k-Position Monitoring of Distributed Data Streams
Alexander M\"acker, Manuel Malatyali, Friedhelm Meyer auf der Heide

TL;DR
This paper presents an efficient algorithm for monitoring the top-k largest values across distributed data streams, minimizing communication between nodes and a central coordinator.
Contribution
It introduces a filter-based algorithm that significantly reduces message exchanges while accurately tracking top-k nodes in real-time.
Findings
Message complexity is within a logarithmic factor of the optimal offline solution.
The algorithm effectively bounds communication costs in distributed top-k monitoring.
Performance depends on the maximum observed value, Δ.
Abstract
Consider n nodes connected to a single coordinator. Each node receives an individual online data stream of numbers and, at any point in time, the coordinator has to know the k nodes currently observing the largest values, for a given k between 1 and n. We design and analyze an algorithm that solves this problem while bounding the amount of messages exchanged between the nodes and the coordinator. Our algorithm employs the idea of using filters which, intuitively speaking, leads to few messages to be sent, if the new input is "similar" to the previous ones. The algorithm uses a number of messages that is on expectation by a factor of O((log {\Delta} + k) log n) larger than that of an offline algorithm that sets filters in an optimal way, where {\Delta} is upper bounded by the largest value observed by any node.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Optimization and Search Problems · Advanced Database Systems and Queries
