Efficient learning by implicit exploration in bandit problems with side observations

Tomas Kocak; Gergely Neu; Michal Valko; Remi Munos

arXiv:2604.24555·cs.LG·April 28, 2026·130 cites

Efficient learning by implicit exploration in bandit problems with side observations

Tomas Kocak, Gergely Neu, Michal Valko, Remi Munos

PDF

1 Datasets

TL;DR

This paper introduces new algorithms for online learning in partial observability settings, achieving near-optimal regret without prior knowledge of the observation system, using a novel implicit exploration strategy.

Contribution

The paper presents the first algorithms with near-optimal regret guarantees for partial observability in bandit problems, employing a new implicit exploration method.

Findings

01

Algorithms achieve near-optimal regret without prior observation system knowledge.

02

Implicit exploration outperforms previous strategies in efficiency and information use.

03

A computationally efficient algorithm is proposed with a more complex tuning mechanism.

Abstract

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition to its own loss, the learner also gets to observe losses of some other actions. The revealed losses depend on the learner's action and a directed observation system chosen by the environment. For this setting, we propose the first algorithm that enjoys near-optimal regret guarantees without having to know the observation system before selecting its actions. Along similar lines, we also define a new partial information setting that models online combinatorial optimization problems where the feedback received by the learner is between semi-bandit and full feedback. As the predictions of our first algorithm cannot be always computed efficiently in this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

misovalko/my-research-papers
dataset· 103 dl
103 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.