Instance-Dependent Regret Bounds for Nonstochastic Linear Partial Monitoring
Federico Di Gennaro, Khaled Eldowa, Nicol\`o Cesa-Bianchi

TL;DR
This paper introduces instance-dependent regret bounds for nonstochastic linear partial monitoring, generalizing linear bandits to more complex feedback structures with adaptive guarantees based on game difficulty.
Contribution
It provides the first instance-specific regret bounds for nonstochastic linear partial monitoring, capturing the influence of game structure on learning performance.
Findings
Achieves regret in easy games
Achieves regret in hard games
Bounds are tight in key cases
Abstract
In contrast to the classic formulation of partial monitoring, linear partial monitoring can model infinite outcome spaces, while imposing a linear structure on both the losses and the observations. This setting can be viewed as a generalization of linear bandits where loss and feedback are decoupled in a flexible manner. In this work, we address a nonstochastic (adversarial), finite-actions version of the problem through a simple instance of the exploration-by-optimization method that is amenable to efficient implementation. We derive regret bounds that depend on the game structure in a more transparent manner than previous theoretical guarantees for this paradigm. Our bounds feature instance-specific quantities that reflect the degree of alignment between observations and losses, and resemble known guarantees in the stochastic setting. Notably, they achieve the standard rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Reinforcement Learning in Robotics
