Phased Exploration with Greedy Exploitation in Stochastic Combinatorial   Partial Monitoring Games

Sougata Chaudhuri; Ambuj Tewari

arXiv:1608.06403·cs.GT·August 24, 2016

Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games

Sougata Chaudhuri, Ambuj Tewari

PDF

TL;DR

This paper introduces a new phased exploration algorithm for combinatorial partial monitoring games that improves regret bounds and simplifies implementation compared to previous methods, especially in large action spaces.

Contribution

It proposes a novel PEGE framework that reduces complexity and relaxes assumptions, and introduces PEGE2 achieving optimal regret without large action space dependence.

Findings

01

PEGE achieves $O(T^{2/3}\sqrt{\log T})$ distribution independent regret.

02

PEGE2 attains $O(\log T)$ regret, matching previous best but with less dependence on action space size.

03

The algorithms are applicable to practical online ranking problems with partial feedback.

Abstract

Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves $O (T^{2/3} lo g T)$ distribution independent and $O (lo g T)$ distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.