Top-K Ranking Deep Contextual Bandits for Information Selection Systems
Jade Freeman, Michael Rawson

TL;DR
This paper introduces a neural network-based method for top-K ranking in contextual bandits, improving content filtering and prioritization by modeling complex reward functions with high-dimensional data.
Contribution
It presents a novel neural network approach for top-K ranking in contextual bandits, handling non-linear reward structures and high-dimensional features.
Findings
Performs well with complex reward structures
Effective with high-dimensional contextual features
Outperforms traditional methods in experiments
Abstract
In today's technology environment, information is abundant, dynamic, and heterogeneous in nature. Automated filtering and prioritization of information is based on the distinction between whether the information adds substantial value toward one's goal or not. Contextual multi-armed bandit has been widely used for learning to filter contents and prioritize according to user interest or relevance. Learn-to-Rank technique optimizes the relevance ranking on items, allowing the contents to be selected accordingly. We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework. We model the stochastic reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts. We demonstrate the approach and evaluate the the performance of learning from the experiments using real world data sets in simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
