Adaptive Exploration for Latent-State Bandits
Jikai Jin, Kenneth Hung, Sanath Kumar Krishnamurthy, Baoyi Shi, Congshan Zhang

TL;DR
This paper introduces adaptive, state-model-free bandit algorithms that use lagged features and probing to effectively handle hidden, changing states, improving decision-making in uncertain environments.
Contribution
It presents novel algorithms that implicitly track latent states without explicit modeling, enhancing robustness and efficiency in non-stationary bandit problems.
Findings
Outperforms classical bandit algorithms in diverse settings
Learns optimal policies without explicit state modeling
Demonstrates robustness to non-stationary reward environments
Abstract
The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action selection. We address key challenges arising from unobserved confounders, such as biased reward estimates and limited state information, by introducing a family of state-model-free bandit algorithms that leverage lagged contextual features and coordinated probing strategies. These implicitly track latent states and disambiguate state-dependent reward patterns. Our methods and their adaptive variants can learn optimal policies without explicit state modeling, combining computational efficiency with robust adaptation to non-stationary rewards. Empirical results across diverse settings demonstrate superior performance over classical approaches, and we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Recommender Systems and Techniques
