Dynamical Linear Bandits
Marco Mussi, Alberto Maria Metelli, Marcello Restelli

TL;DR
This paper introduces the Dynamical Linear Bandits framework, modeling sequential decision problems with delayed, evolving effects using a hidden state and linear dynamics, and proposes an algorithm with provable regret bounds.
Contribution
It extends linear bandits to include hidden states with linear dynamics, providing a new setting and an optimistic algorithm with theoretical regret guarantees.
Findings
Regret bound of order $ ilde{O}(d rac{ oot{T}}{(1-ar{ ho})^{3/2}})$ for DynLin-UCB
Effective in synthetic and real-world environments compared to baselines
Introduces a novel dynamical structure in linear bandit models
Abstract
In many real-world sequential decision-making problems, an action does not immediately reflect on the feedback and spreads its effects over a long time frame. For instance, in online advertising, investing in a platform produces an instantaneous increase of awareness, but the actual reward, i.e., a conversion, might occur far in the future. Furthermore, whether a conversion takes place depends on: how fast the awareness grows, its vanishing effects, and the synergy or interference with other advertising platforms. Previous work has investigated the Multi-Armed Bandit framework with the possibility of delayed and aggregated feedback, without a particular structure on how an action propagates in the future, disregarding possible dynamical effects. In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management
