Beyond $1/2$-Approximation for Submodular Maximization on Massive Data Streams
Ashkan Norouzi-Fard, Jakub Tarnawski, Slobodan Mitrovi\'c, Amir, Zandieh, Aida Mousavifar, and Ola Svensson

TL;DR
This paper introduces SALSA, a streaming algorithm that surpasses the 0.5-approximation barrier for submodular maximization under random element arrival, with practical benefits demonstrated in real-world data tasks.
Contribution
The paper presents SALSA, the first low-memory, single-pass streaming algorithm achieving better than 0.5-approximation for submodular maximization under random order assumptions.
Findings
SALSA outperforms previous methods in experiments.
Improves approximation factor beyond 0.5 under random order.
No better approximation is possible in arbitrary order.
Abstract
Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a -approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm SALSA for streaming submodular maximization. It is the first low-memory, single-pass algorithm that improves the factor , under the natural assumption that elements arrive in a random order. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Privacy-Preserving Technologies in Data · Data Mining Algorithms and Applications
