Non-Stationary Bandits with Auto-Regressive Temporal Dependency

Qinyi Chen; Negin Golrezaei; Djallel Bouneffouf

arXiv:2210.16386·cs.LG·December 13, 2023

Non-Stationary Bandits with Auto-Regressive Temporal Dependency

Qinyi Chen, Negin Golrezaei, Djallel Bouneffouf

PDF

Open Access 1 Video

TL;DR

This paper presents a new non-stationary multi-armed bandit framework with auto-regressive reward structures, introducing an algorithm that effectively balances exploration and exploitation in dynamic environments, validated through theoretical bounds and real-world case studies.

Contribution

It introduces a novel non-stationary MAB model with auto-regressive rewards and an algorithm that leverages temporal dependencies with near-optimal regret bounds.

Findings

01

Algorithm achieves near-optimal regret bounds.

02

Effective in real-world tourism demand prediction.

03

Demonstrates applicability to complex, evolving time series.

Abstract

Traditional multi-armed bandit (MAB) frameworks, predominantly examined under stochastic or adversarial settings, often overlook the temporal dynamics inherent in many real-world applications such as recommendation systems and online advertising. This paper introduces a novel non-stationary MAB framework that captures the temporal structure of these real-world dynamics through an auto-regressive (AR) reward structure. We propose an algorithm that integrates two key mechanisms: (i) an alternation mechanism adept at leveraging temporal dependencies to dynamically balance exploration and exploitation, and (ii) a restarting mechanism designed to discard out-of-date information. Our algorithm achieves a regret upper bound that nearly matches the lower bound, with regret measured against a robust dynamic benchmark. Finally, via a real-world case study on tourism demand prediction, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Non-Stationary Bandits with Auto-Regressive Temporal Dependency· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics

Methodsfail