A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

Zhen Li; Gilles Stoltz (LMO; CELESTE; HEC Paris)

arXiv:2604.08149·cs.LG·April 10, 2026

A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

Zhen Li, Gilles Stoltz (LMO, CELESTE, HEC Paris)

PDF

TL;DR

This paper introduces a new approach to finite-armed linear bandits with hidden Markov states, providing stronger high-probability regret bounds that adaptively estimate HMM parameters without relying on reward functions.

Contribution

It develops a natural model with direct hidden state dependencies and offers an adaptive strategy with regret bounds independent of reward functions.

Findings

01

Achieves high-probability regret bounds for the model

02

Estimates HMM parameters online in a fully adaptive manner

03

Bounds depend only on HMM parameter estimation, not reward functions

Abstract

We revisit the finite-armed linear bandit model by Nelson et al. (2022), where contexts and rewards are governed by a finite hidden Markov chain. Nelson et al. (2022) approach this model by a reduction to linear contextual bandits; but to do so, they actually introduce a simplification in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts, rather than functions of the hidden states themselves. Their analysis (but not their algorithm) also does not take into account the estimation of the HMM parameters, and only tackles expected, not high-probability, bounds, which suffer in addition from unnecessary complex dependencies on the model (like reward gaps). We instead study the more natural model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.