Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs

Jing Wang; Peng Zhao; Zhi-Hua Zhou

arXiv:2601.01069·cs.LG·January 6, 2026

Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs

Jing Wang, Peng Zhao, Zhi-Hua Zhou

PDF

Open Access

TL;DR

This paper refines the analysis of weighted strategies for non-stationary parametric bandits, leading to simpler algorithms with improved regret bounds and extending these insights to non-stationary MDPs with function approximation.

Contribution

It introduces a refined analysis framework for weighted strategies, simplifying algorithm design and improving regret bounds for linear, generalized linear, and self-concordant bandits, and extends to non-stationary MDPs.

Findings

01

Simpler weight-based algorithms matching the efficiency of window/restart strategies.

02

Improved regret bounds for generalized linear bandits.

03

Extension of the framework to non-stationary MDPs with function approximation.

Abstract

Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strategies. As many non-stationary environments exhibit gradual drifting patterns, the weighted strategy is commonly adopted in real-world applications. However, previous theoretical studies show that its analysis is more involved and the algorithms are either computationally less efficient or statistically suboptimal. This paper revisits the weighted strategy for non-stationary parametric bandits. In linear bandits (LB), we discover that this undesirable feature is due to an inadequate regret analysis, which results in an overly complex algorithm design. We propose a \emph{refined analysis framework}, which simplifies the derivation and, importantly, produces a simpler weight-based algorithm that is as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Gaussian Processes and Bayesian Inference