# Distributed Data Summarization in Well-Connected Networks

**Authors:** Hsin-Hao Su, Hoa T. Vu

arXiv: 1908.00236 · 2019-08-07

## TL;DR

This paper develops efficient distributed algorithms for data summarization tasks in well-connected networks, achieving near-optimal round complexity and demonstrating strong simulation capabilities between the GOSSIP and CONGEST models.

## Contribution

It introduces new algorithms for exact and approximate data summarization in well-connected graphs, linking GOSSIP and CONGEST models and improving round complexity bounds.

## Key findings

- Exact sum computation in rac{(	ext{mixing time})}{	ext{rounds}} for well-connected graphs.
- New GOSSIP algorithm approximates frequency moments with rac{	ext{(	ext{error})}}{	ext{rounds}} complexity.
- Strong simulation of GOSSIP in CONGEST model with near-perfect accuracy.

## Abstract

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph $G$ of $n$ nodes each of which may hold a value initially, we focus on computing $\sum_{i=1}^N g(f_i)$, where $f_i$ is the number of occurrences of value $i$ and $g$ is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data.   In the CONGEST model, a simple adaptation from streaming lower bounds shows that it requires $\tilde{\Omega}(D+ n)$ rounds, where $D$ is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes $\sum_{i=1}^{N} g(f_i)$ exactly in $\tau_G \cdot 2^{O(\sqrt{\log n})}$ rounds where $\tau_G$ is the mixing time of $G$. This also has applications in computing the top $k$ most frequent elements.   We demonstrate that there is a high similarity between the GOSSIP model and the CONGEST model in well-connected graphs. In particular, we show that each round of the GOSSIP model can be simulated almost-perfectly in $\tilde{O}(\tau_G $ rounds of the CONGEST model. To this end, we develop a new algorithm for the GOSSIP model that $1\pm \epsilon$ approximates the $p$-th frequency moment $F_p = \sum_{i=1}^N f_i^p$ in $\tilde{O}(\epsilon^{-2} n^{1-k/p})$ rounds, for $p \geq2$, when the number of distinct elements $F_0$ is at most $O\left(n^{1/(k-1)}\right)$. This result can be translated back to the CONGEST model with a factor $\tilde{O}(\tau_G)$ blow-up in the number of rounds.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.00236/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1908.00236/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/1908.00236/full.md

---
Source: https://tomesphere.com/paper/1908.00236