From Bandits to Experts: A Tale of Domination and Independence

Noga Alon; Nicol\`o Cesa-Bianchi; Claudio Gentile; Yishay Mansour

arXiv:1307.4564·cs.LG·July 18, 2013·34 cites

From Bandits to Experts: A Tale of Domination and Independence

Noga Alon, Nicol\`o Cesa-Bianchi, Claudio Gentile, Yishay Mansour

PDF

Open Access

TL;DR

This paper characterizes regret bounds in partial observability multi-armed bandits using graph parameters, showing optimal regret can be achieved without full graph access in undirected cases.

Contribution

It introduces a graph-theoretic characterization of regret in directed observability models and demonstrates that optimal regret is achievable without prior graph access in undirected models.

Findings

01

Regret bounds are characterized by dominating and independence numbers of the observability graph.

02

Optimal regret is achievable in undirected models without prior access to the observability graph.

03

Variants of the Exp3 algorithm are used to achieve these results efficiently.

Abstract

We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir. Our main result is a characterization of regret in the directed observability model in terms of the dominating and independence numbers of the observability graph. We also show that in the undirected case, the learner can achieve optimal regret without even accessing the observability graph before selecting an action. Both results are shown using variants of the Exp3 algorithm operating on the observability graph in a time-efficient manner.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics