An Adaptive Algorithm for Finite Stochastic Partial Monitoring
Gabor Bartok (University of Alberta), Navid Zolghadr (University of, Alberta), Csaba Szepesvari (University of Alberta)

TL;DR
This paper introduces an adaptive anytime algorithm for finite stochastic partial monitoring that achieves near-optimal regret across various problem complexities, including easy and hard instances, with specific benefits in dynamic pricing scenarios.
Contribution
The paper proposes a novel adaptive algorithm that attains near-optimal regret in finite stochastic partial monitoring, adapting to problem difficulty and opponent strategies.
Findings
Achieves minimax regret within logarithmic factors for all problem types.
Attains logarithmic individual regret for easy problems.
Demonstrates O(√T) regret in dynamic pricing under certain conditions.
Abstract
We present a new anytime algorithm that achieves near-optimal regret for any instance of finite stochastic partial monitoring. In particular, the new algorithm achieves the minimax regret, within logarithmic factors, for both "easy" and "hard" problems. For easy problems, it additionally achieves logarithmic individual regret. Most importantly, the algorithm is adaptive in the sense that if the opponent strategy is in an "easy region" of the strategy space then the regret grows as if the problem was easy. As an implication, we show that under some reasonable additional assumptions, the algorithm enjoys an O(\sqrt{T}) regret in Dynamic Pricing, proven to be hard by Bartok et al. (2011).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Gaussian Processes and Bayesian Inference
