Identifiable Latent Bandits: Leveraging observational data for personalized decision-making
Ahmet Zahid Balc{\i}o\u{g}lu, Newton Mwai, Emil Carlsson, Fredrik D. Johansson

TL;DR
This paper introduces an identifiable latent bandit framework that uses observational data and nonlinear ICA to improve personalized decision-making efficiency, especially in data-scarce scenarios like medicine.
Contribution
It proposes a novel latent bandit model with provable identifiability from observational data, enabling faster and more accurate personalized decisions.
Findings
Substantial improvement over baseline methods in simulated environments.
The approach reduces exploration time compared to classical bandits.
The method is validated in semi-synthetic settings.
Abstract
Sequential decision-making algorithms such as multi-armed bandits can find optimal personalized decisions, but are notoriously sample-hungry. In personalized medicine, for example, training a bandit from scratch for every patient is typically infeasible, as the number of trials required is much larger than the number of decision points for a single patient. To combat this, latent bandits offer rapid exploration and personalization beyond what context variables alone can offer, provided that a latent variable model of problem instances can be learned consistently. However, existing works give no guidance as to how such a model can be found. In this work, we propose an identifiable latent bandit framework that leads to optimal decision-making with a shorter exploration time than classical bandits by learning from historical records of decisions and outcomes. Our method is based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
