TL;DR
This paper introduces ThreeSieves, a fast streaming algorithm for submodular maximization that performs well in practice, especially on well-behaved data, with lower resource usage and strong theoretical guarantees.
Contribution
The paper presents a novel streaming algorithm, ThreeSieves, which bypasses worst-case assumptions to efficiently maximize submodular functions with high probability.
Findings
Outperforms six existing methods on eight datasets.
Uses fewer computational resources than state-of-the-art algorithms.
Effective in real-world data summarization, demonstrated in gamma-ray astronomy.
Abstract
Data summarization has become a valuable tool in understanding even terabytes of data. Due to their compelling theoretical properties, submodular functions have been in the focus of summarization algorithms. These algorithms offer worst-case approximations guarantees to the expense of higher computation and memory requirements. However, many practical applications do not fall under this worst-case, but are usually much more well-behaved. In this paper, we propose a new submodular function maximization algorithm called ThreeSieves, which ignores the worst-case, but delivers a good solution in high probability. It selects the most informative items from a data-stream on the fly and maintains a provable performance on a fixed memory budget. In an extensive evaluation, we compare our method against other methods on different datasets with and without concept drift. We show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
