Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback
Fares Fourati, Vaneet Aggarwal, Christopher John Quinn, Mohamed-Slim, Alouini

TL;DR
This paper introduces a new algorithm, RGL, for non-monotone stochastic submodular maximization in bandit settings, achieving theoretical regret bounds and outperforming existing methods empirically.
Contribution
The paper proposes the RGL algorithm for non-monotone stochastic submodular maximization under full-bandit feedback, extending prior work to more general reward functions.
Findings
RGL achieves a $rac{1}{2}$-regret upper bound of $ ilde{O}(n T^{2/3})$.
Empirical results show RGL outperforms other algorithms in various settings.
Theoretical analysis confirms the effectiveness of RGL in non-monotone cases.
Abstract
We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular maximization. Previous works investigate the same problem assuming a submodular and monotone reward function. In this work, we study a more general problem, i.e., when the reward function is not necessarily monotone, and the submodularity is assumed only in expectation. We propose Randomized Greedy Learning (RGL) algorithm and theoretically prove that it achieves a -regret upper bound of for horizon and number of arms . We also show in experiments that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Complexity and Algorithms in Graphs · Optimization and Search Problems
