A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

Wei Min Loh; Sajib Kumer Sinha; Ankur Agarwal; Pascal Poupart

arXiv:2603.16755·cs.LG·March 18, 2026·Trans. Mach. Learn. Res.

A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems

Wei Min Loh, Sajib Kumer Sinha, Ankur Agarwal, Pascal Poupart

PDF

Open Access

TL;DR

This paper introduces a new algorithm for non-stationary, feature-rich contextual bandit problems that adaptively learns reward functions and correlations over time, improving performance in recommendation and online learning tasks.

Contribution

It proposes a novel conditionally coupled contextual C3 Thompson sampling algorithm that handles non-linear rewards and changing correlations without retraining.

Findings

01

C3 outperforms existing algorithms with 5.7% lower regret on OpenML datasets.

02

Achieves 12.4% click lift on Microsoft News Dataset (MIND).

03

Demonstrates effective online learning in non-stationary, feature-rich environments.

Abstract

Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks. To solve this problem, we introduce conditionally coupled contextual C3 Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Advanced Causal Inference Techniques