A Practical Algorithm for Feature-Rich, Non-Stationary Bandit Problems
Wei Min Loh, Sajib Kumer Sinha, Ankur Agarwal, Pascal Poupart

TL;DR
This paper introduces a new algorithm for non-stationary, feature-rich contextual bandit problems that adaptively learns reward functions and correlations over time, improving performance in recommendation and online learning tasks.
Contribution
It proposes a novel conditionally coupled contextual C3 Thompson sampling algorithm that handles non-linear rewards and changing correlations without retraining.
Findings
C3 outperforms existing algorithms with 5.7% lower regret on OpenML datasets.
Achieves 12.4% click lift on Microsoft News Dataset (MIND).
Demonstrates effective online learning in non-stationary, feature-rich environments.
Abstract
Contextual bandits are incredibly useful in many practical problems. We go one step further by devising a more realistic problem that combines: (1) contextual bandits with dense arm features, (2) non-linear reward functions, and (3) a generalization of correlated bandits where reward distributions change over time but the degree of correlation maintains. This formulation lends itself to a wider set of applications such as recommendation tasks. To solve this problem, we introduce conditionally coupled contextual C3 Thompson sampling for Bernoulli bandits. It combines an improved Nadaraya-Watson estimator on an embedding space with Thompson sampling that allows online learning without retraining. Empirical results show that C3 outperforms the next best algorithm by 5.7% lower average cumulative regret on four OpenML tabular datasets as well as demonstrating a 12.4% click lift on Microsoft…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Advanced Causal Inference Techniques
