Near-Optimal Data Source Selection for Bayesian Learning
Lintao Ye, Aritra Mitra, Shreyas Sundaram

TL;DR
This paper addresses the challenge of selecting data sources efficiently for Bayesian learning, proposing algorithms with provable guarantees and validating their effectiveness through numerical experiments.
Contribution
It transforms the data source selection problem into a submodular set covering problem and introduces a fast greedy algorithm with strong performance guarantees.
Findings
The data source selection problem is NP-hard.
The greedy algorithms achieve near-optimal performance.
Numerical results confirm practical effectiveness.
Abstract
We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. The fast greedy algorithm can also be applied to solve the general submodular set covering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Distributed Sensor Networks and Detection Algorithms · Advanced Bandit Algorithms Research
