Information-Gathering in Latent Bandits

Alexander Galozy; Slawomir Nowaczyk

arXiv:2207.03635·cs.LG·July 11, 2022

Information-Gathering in Latent Bandits

Alexander Galozy, Slawomir Nowaczyk

PDF

Open Access

TL;DR

This paper introduces a method for information-gathering in latent bandits, improving state estimation and reducing regret by strategically choosing arms that are not necessarily the highest reward but provide valuable information.

Contribution

The paper proposes a novel approach for explicit information-gathering in latent bandits, enhancing state estimation and regret minimization over existing methods.

Findings

01

Significant regret reduction on synthetic data

02

Improved state estimation accuracy

03

Enhanced performance on real-world datasets

Abstract

In the latent bandit problem, the learner has access to reward distributions and -- for the non-stationary variant -- transition models of the environment. The reward distributions are conditioned on the arm and unknown latent states. The goal is to use the reward history to identify the latent state, allowing for the optimal choice of arms in the future. The latent bandit setting lends itself to many practical applications, such as recommender and decision support systems, where rich data allows the offline estimation of environment models with online learning remaining a critical component. Previous solutions in this setting always choose the highest reward arm according to the agent's beliefs about the state, not explicitly considering the value of information-gathering arms. Such information-gathering arms do not necessarily provide the highest reward, thus may never be chosen by an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Recommender Systems and Techniques