Adaptive Exploration for Latent-State Bandits

Jikai Jin; Kenneth Hung; Sanath Kumar Krishnamurthy; Baoyi Shi; Congshan Zhang

arXiv:2602.05139·cs.LG·February 19, 2026

Adaptive Exploration for Latent-State Bandits

Jikai Jin, Kenneth Hung, Sanath Kumar Krishnamurthy, Baoyi Shi, Congshan Zhang

PDF

Open Access

TL;DR

This paper introduces adaptive, state-model-free bandit algorithms that use lagged features and probing to effectively handle hidden, changing states, improving decision-making in uncertain environments.

Contribution

It presents novel algorithms that implicitly track latent states without explicit modeling, enhancing robustness and efficiency in non-stationary bandit problems.

Findings

01

Outperforms classical bandit algorithms in diverse settings

02

Learns optimal policies without explicit state modeling

03

Demonstrates robustness to non-stationary reward environments

Abstract

The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action selection. We address key challenges arising from unobserved confounders, such as biased reward estimates and limited state information, by introducing a family of state-model-free bandit algorithms that leverage lagged contextual features and coordinated probing strategies. These implicitly track latent states and disambiguate state-dependent reward patterns. Our methods and their adaptive variants can learn optimal policies without explicit state modeling, combining computational efficiency with robust adaptation to non-stationary rewards. Empirical results across diverse settings demonstrate superior performance over classical approaches, and we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Recommender Systems and Techniques