Small-loss bounds for online learning with partial information

Thodoris Lykouris; Karthik Sridharan; and Eva Tardos

arXiv:1711.03639·cs.LG·July 28, 2021

Small-loss bounds for online learning with partial information

Thodoris Lykouris, Karthik Sridharan, and Eva Tardos

PDF

TL;DR

This paper introduces algorithms for adversarial online learning with partial feedback that achieve data-dependent small-loss regret bounds, extending to various settings like semi-bandits and contextual bandits, with optimal guarantees.

Contribution

It provides the first data-dependent small-loss regret bounds for general feedback graphs and extends these results to multiple online learning scenarios using a black-box approach.

Findings

01

Achieved small-loss regret bounds of o(α L*) with high probability.

02

Extended results to semi-bandits, contextual bandits, and shifting comparators.

03

Provided optimal bounds for classical bandit and semi-bandit problems, answering open questions.

Abstract

We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each round, a decision maker selects an action from a finite set of alternatives. We develop a black-box approach for such problems where the learner observes as feedback only losses of a subset of the actions that includes the selected action. When losses of actions are non-negative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called "small-loss" $o (α L^{⋆})$ regret bounds with high probability, where $α$ is the independence number of the graph, and $L^{⋆}$ is the loss of the best action. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e. utilizing the increased information feedback).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.