Top-K Ranking Deep Contextual Bandits for Information Selection Systems

Jade Freeman; Michael Rawson

arXiv:2201.13287·cs.LG·February 1, 2022

Top-K Ranking Deep Contextual Bandits for Information Selection Systems

Jade Freeman, Michael Rawson

PDF

TL;DR

This paper introduces a neural network-based method for top-K ranking in contextual bandits, improving content filtering and prioritization by modeling complex reward functions with high-dimensional data.

Contribution

It presents a novel neural network approach for top-K ranking in contextual bandits, handling non-linear reward structures and high-dimensional features.

Findings

01

Performs well with complex reward structures

02

Effective with high-dimensional contextual features

03

Outperforms traditional methods in experiments

Abstract

In today's technology environment, information is abundant, dynamic, and heterogeneous in nature. Automated filtering and prioritization of information is based on the distinction between whether the information adds substantial value toward one's goal or not. Contextual multi-armed bandit has been widely used for learning to filter contents and prioritize according to user interest or relevance. Learn-to-Rank technique optimizes the relevance ranking on items, allowing the contents to be selected accordingly. We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework. We model the stochastic reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts. We demonstrate the approach and evaluate the the performance of learning from the experiments using real world data sets in simulated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.