Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
Julien Zhou (Thoth, STATIFY), Pierre Gaillard (Thoth), Thibaud Rahier,, Julyan Arbel (STATIFY)

TL;DR
This paper introduces a new algorithm for online unconstrained submodular maximization with stochastic bandit feedback, achieving improved regret bounds and characterizing the problem's hardness transition.
Contribution
It proposes the DG-ETC algorithm, combining Double-Greedy with explore-then-commit, and provides new regret bounds along with a hardness measure for the problem.
Findings
Achieves $O(d ext{log}(dT))$ problem-dependent regret bound.
Achieves $O(dT^{2/3} ext{log}(dT)^{1/3})$ problem-free regret bound.
Introduces a hardness measure for the transition between regret regimes.
Abstract
We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a problem-dependent upper bound for the -approximate pseudo-regret, as well as a problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Distributed Sensor Networks and Detection Algorithms
MethodsNetwork On Network
