Semi-Bandit Learning for Monotone Stochastic Optimization

Arpit Agarwal; Rohan Ghuge; Viswanath Nagarajan; Zhengjia Zhuo

arXiv:2312.15427·cs.LG·August 14, 2025·1 cites

Semi-Bandit Learning for Monotone Stochastic Optimization

Arpit Agarwal, Rohan Ghuge, Viswanath Nagarajan, Zhengjia Zhuo

PDF

Open Access

TL;DR

This paper introduces a semi-bandit learning algorithm for monotone stochastic optimization problems, enabling near-optimal solutions without prior knowledge of probability distributions, even with limited feedback.

Contribution

It presents a novel online learning algorithm with low regret for a broad class of stochastic problems, working under semi-bandit and censored feedback settings.

Findings

01

Achieves $ ilde{O}( oot{T} ext{log}(T))$ regret bound.

02

Extends to censored and binary feedback scenarios.

03

Applies to problems like prophet inequality and stochastic knapsack.

Abstract

Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of ''monotone'' stochastic problems, by providing a generic online learning algorithm with $T lo g (T)$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques