Communication Efficient Checking of Big Data Operations
Lorenz H\"ubschle-Schneider, Peter Sanders

TL;DR
This paper introduces probabilistic algorithms that enable efficient, low-communication verification of common big data operations in distributed systems, ensuring correctness with minimal overhead.
Contribution
It presents novel sublinear communication algorithms for verifying key data operations, applicable to frameworks like Thrill, with proven effectiveness and high failure detection.
Findings
Algorithms achieve sublinear communication complexity.
Experimental results confirm low overhead and high detection accuracy.
Applicable to a wide range of data operations.
Abstract
We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
