Non-Stationary Bandits with Auto-Regressive Temporal Dependency
Qinyi Chen, Negin Golrezaei, Djallel Bouneffouf

TL;DR
This paper presents a new non-stationary multi-armed bandit framework with auto-regressive reward structures, introducing an algorithm that effectively balances exploration and exploitation in dynamic environments, validated through theoretical bounds and real-world case studies.
Contribution
It introduces a novel non-stationary MAB model with auto-regressive rewards and an algorithm that leverages temporal dependencies with near-optimal regret bounds.
Findings
Algorithm achieves near-optimal regret bounds.
Effective in real-world tourism demand prediction.
Demonstrates applicability to complex, evolving time series.
Abstract
Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics
Methodsfail
