Phased Exploration with Greedy Exploitation in Stochastic Combinatorial Partial Monitoring Games
Sougata Chaudhuri, Ambuj Tewari

TL;DR
This paper introduces a new phased exploration algorithm for combinatorial partial monitoring games that improves regret bounds and simplifies implementation compared to previous methods, especially in large action spaces.
Contribution
It proposes a novel PEGE framework that reduces complexity and relaxes assumptions, and introduces PEGE2 achieving optimal regret without large action space dependence.
Findings
PEGE achieves $O(T^{2/3}\sqrt{\log T})$ distribution independent regret.
PEGE2 attains $O(\log T)$ regret, matching previous best but with less dependence on action space size.
The algorithms are applicable to practical online ranking problems with partial feedback.
Abstract
Partial monitoring games are repeated games where the learner receives feedback that might be different from adversary's move or even the reward gained by the learner. Recently, a general model of combinatorial partial monitoring (CPM) games was proposed \cite{lincombinatorial2014}, where the learner's action space can be exponentially large and adversary samples its moves from a bounded, continuous space, according to a fixed distribution. The paper gave a confidence bound based algorithm (GCB) that achieves distribution independent and distribution dependent regret bounds. The implementation of their algorithm depends on two separate offline oracles and the distribution dependent regret additionally requires existence of a unique optimal action for the learner. Adopting their CPM model, our first contribution is a Phased Exploration with Greedy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
