Trend Detection based Regret Minimization for Bandit Problems

Paresh Nakhe; Rebecca Reiffenh\"auser

arXiv:1709.05156·cs.LG·September 18, 2017

Trend Detection based Regret Minimization for Bandit Problems

Paresh Nakhe, Rebecca Reiffenh\"auser

PDF

TL;DR

This paper introduces a trend detection approach for multi-armed bandit problems with structured losses, achieving improved regret bounds by leveraging the trend properties of the losses.

Contribution

It proposes a novel trend detection method that enhances regret minimization in structured bandit problems, compatible with existing algorithms like Exp3.

Findings

01

Achieves regret of tenO(N sqrt; TK) for single action selection.

02

Attains regret of tenO(Nm sqrt; TK) when multiple actions are chosen.

03

Demonstrates advantages over traditional strategies through theoretical analysis.

Abstract

We study a variation of the classical multi-armed bandits problem. In this problem, the learner has to make a sequence of decisions, picking from a fixed set of choices. In each round, she receives as feedback only the loss incurred from the chosen action. Conventionally, this problem has been studied when losses of the actions are drawn from an unknown distribution or when they are adversarial. In this paper, we study this problem when the losses of the actions also satisfy certain structural properties, and especially, do show a trend structure. When this is true, we show that using \textit{trend detection}, we can achieve regret of order $\tilde{O} (N T K)$ with respect to a switching strategy for the version of the problem where a single action is chosen in each round and $\tilde{O} (N m T K)$ when $m$ actions are chosen each round. This guarantee is a significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.