Better Sum Estimation via Weighted Sampling

Lorenzo Beretta; Jakub T\v{e}tek

arXiv:2110.14948·cs.DS·October 29, 2021

Better Sum Estimation via Weighted Sampling

Lorenzo Beretta, Jakub T\v{e}tek

PDF

Open Access

TL;DR

This paper improves the bounds and simplicity of algorithms for estimating total weights in large sets using weighted sampling, addressing both proportional and hybrid sampling settings, and extends to unknown set sizes and graph edge counting.

Contribution

It provides tighter bounds and simpler algorithms for sum estimation in proportional and hybrid sampling settings, including unknown set sizes and applications to graph problems.

Findings

01

Improved sum estimation algorithms with bounds matching in both n and ε.

02

Extended techniques to unknown set size scenarios.

03

Applied methods to graph edge counting problems.

Abstract

Given a large set $U$ where each item $a \in U$ has weight $w (a)$ , we want to estimate the total weight $W = \sum_{a \in U} w (a)$ to within factor of $1 \pm ε$ with some constant probability $> 1/2$ . Since $n = ∣ U ∣$ is large, we want to do this without looking at the entire set $U$ . In the traditional setting in which we are allowed to sample elements from $U$ uniformly, sampling $Ω (n)$ items is necessary to provide any non-trivial guarantee on the estimate. Therefore, we investigate this problem in different settings: in the \emph{proportional} setting we can sample items with probabilities proportional to their weights, and in the \emph{hybrid} setting we can sample both proportionally and uniformly. These settings have applications, for example, in sublinear-time algorithms and distribution testing. Sum estimation in the proportional and hybrid setting has been considered…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Complexity and Algorithms in Graphs · Advanced Bandit Algorithms Research