Coresets for Estimating Means and Mean Square Error with Limited Greedy   Samples

Saeed Vahidian; Baharan Mirzasoleiman; Alexander Cloninger

arXiv:1906.01021·cs.LG·June 23, 2020·1 cites

Coresets for Estimating Means and Mean Square Error with Limited Greedy Samples

Saeed Vahidian, Baharan Mirzasoleiman, Alexander Cloninger

PDF

Open Access

TL;DR

This paper introduces a scalable greedy algorithm for coreset selection that efficiently estimates means and mean square errors in graph-structured data, outperforming existing methods in accuracy and speed.

Contribution

The paper presents a novel gradient ascent-based coreset selection algorithm that handles variable node costs and provides theoretical error bounds, with extensive empirical validation.

Findings

01

Faster empirical convergence than random and clustering methods

02

Effective in semi-supervised node classification and sensor placement

03

Outperforms current state-of-the-art algorithms

Abstract

In a number of situations, collecting a function value for every data point may be prohibitively expensive, and random sampling ignores any structure in the underlying data. We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank-Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of vertices that are deemed most important to sample. Our algorithm estimates the mean of the function by taking a weighted sum only at these vertices, and we provably bound the estimation error in terms of the location and weights of the selected vertices in the graph. In addition, we consider the case where nodes have different selection costs and provide bounds on the quality of the low-cost selected coresets. We demonstrate the benefits of our algorithm on the semi-supervised node classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference · Neural Networks and Applications