Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of   Information Selection

Michael Rawson; Jade Freeman

arXiv:2110.04127·cs.LG·January 31, 2022

Deep Upper Confidence Bound Algorithm for Contextual Bandit Ranking of Information Selection

Michael Rawson, Jade Freeman

PDF

Open Access

TL;DR

This paper introduces Deep UCB, a novel algorithm for non-linear contextual bandit ranking using deep neural networks, demonstrating improved performance and theoretical regret bounds in high-dimensional, real-world data scenarios.

Contribution

We propose Deep UCB, a new deep learning-based algorithm for non-linear contextual bandits that relaxes linearity assumptions and provides theoretical regret guarantees.

Findings

01

Deep UCB often outperforms other bandit algorithms in experiments.

02

Deep UCB is sensitive to problem and reward setup.

03

Theoretical regret bounds are established for Deep UCB.

Abstract

Contextual multi-armed bandits (CMAB) have been widely used for learning to filter and prioritize information according to a user's interest. In this work, we analyze top-K ranking under the CMAB framework where the top-K arms are chosen iteratively to maximize a reward. The context, which represents a set of observable factors related to the user, is used to increase prediction accuracy compared to a standard multi-armed bandit. Contextual bandit methods have mostly been studied under strict linearity assumptions, but we drop that assumption and learn non-linear stochastic reward functions with deep neural networks. We introduce a novel algorithm called the Deep Upper Confidence Bound (UCB) algorithm. Deep UCB balances exploration and exploitation with a separate neural network to model the learning convergence. We compare the performance of many bandit algorithms varying K over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics