Contextual Bandits for adapting to changing User preferences over time
Dattaraj Rao

TL;DR
This paper introduces a novel contextual bandit algorithm using multiple SGD learners to adapt to changing user preferences over time, demonstrated on simulated and MovieLens datasets.
Contribution
It develops a new algorithm for contextual bandits with multiple learners per action, addressing limitations of static models in dynamic environments.
Findings
The static classifier's accuracy drops with changing context.
The new algorithm adapts better to evolving user preferences.
Results show improved prediction of movie ratings over time.
Abstract
Contextual bandits provide an effective way to model the dynamic data problem in ML by leveraging online (incremental) learning to continuously adjust the predictions based on changing environment. We explore details on contextual bandits, an extension to the traditional reinforcement learning (RL) problem and build a novel algorithm to solve this problem using an array of action-based learners. We apply this approach to model an article recommendation system using an array of stochastic gradient descent (SGD) learners to make predictions on rewards based on actions taken. We then extend the approach to a publicly available MovieLens dataset and explore the findings. First, we make available a simplified simulated dataset showing varying user preferences over time and how this can be evaluated with static and dynamic learning algorithms. This dataset made available as part of this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Recommender Systems and Techniques · Data Stream Mining Techniques
