Small-loss bounds for online learning with partial information
Thodoris Lykouris, Karthik Sridharan, and Eva Tardos

TL;DR
This paper introduces algorithms for adversarial online learning with partial feedback that achieve data-dependent small-loss regret bounds, extending to various settings like semi-bandits and contextual bandits, with optimal guarantees.
Contribution
It provides the first data-dependent small-loss regret bounds for general feedback graphs and extends these results to multiple online learning scenarios using a black-box approach.
Findings
Achieved small-loss regret bounds of o(α L*) with high probability.
Extended results to semi-bandits, contextual bandits, and shifting comparators.
Provided optimal bounds for classical bandit and semi-bandit problems, answering open questions.
Abstract
We consider the problem of adversarial (non-stochastic) online learning with partial information feedback, where at each round, a decision maker selects an action from a finite set of alternatives. We develop a black-box approach for such problems where the learner observes as feedback only losses of a subset of the actions that includes the selected action. When losses of actions are non-negative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called "small-loss" regret bounds with high probability, where is the independence number of the graph, and is the loss of the best action. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e. utilizing the increased information feedback).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
