An efficient algorithm for learning with semi-bandit feedback

Gergely Neu; G\'abor Bart\'ok

arXiv:1305.2732·cs.LG·May 14, 2013

An efficient algorithm for learning with semi-bandit feedback

Gergely Neu, G\'abor Bart\'ok

PDF

Open Access

TL;DR

This paper introduces a new efficient algorithm for online combinatorial optimization with semi-bandit feedback, combining FPL with Geometric Resampling, achieving improved regret bounds and broad applicability.

Contribution

The authors develop a novel algorithm that efficiently handles combinatorial decision sets using FPL and Geometric Resampling, with improved regret bounds.

Findings

01

Expected regret is O(m sqrt(d T log d)) after T rounds.

02

Improved FPL regret bounds to O(m^{3/2} sqrt(T log d)).

03

Algorithm is efficiently implementable for decision sets with efficient offline optimization.

Abstract

We consider the problem of online combinatorial optimization under semi-bandit feedback. The goal of the learner is to sequentially select its actions from a combinatorial decision set so as to minimize its cumulative loss. We propose a learning algorithm for this problem based on combining the Follow-the-Perturbed-Leader (FPL) prediction method with a novel loss estimation procedure called Geometric Resampling (GR). Contrary to previous solutions, the resulting algorithm can be efficiently implemented for any decision set where efficient offline combinatorial optimization is possible at all. Assuming that the elements of the decision set can be described with d-dimensional binary vectors with at most m non-zero entries, we show that the expected regret of our algorithm after T rounds is O(m sqrt(dT log d)). As a side result, we also improve the best known regret bounds for FPL in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems