High-dimensional Nonparametric Contextual Bandit Problem
Shogo Iwazaki, Junpei Komiyama, Masaaki Imaizumi

TL;DR
This paper studies high-dimensional kernelized contextual bandits, proposing assumptions and analyses that enable no-regret learning and lenient regret bounds despite large feature spaces.
Contribution
It introduces stochastic context assumptions and analyzes lenient regret, extending understanding of learning in high-dimensional kernelized bandit problems.
Findings
No-regret learning is achievable with growing dimensions under stochastic assumptions.
Derived lenient regret rates as a function of the allowed per-round regret .
Addresses limitations of Gaussian kernel methods in high-dimensional settings.
Abstract
We consider the kernelized contextual bandit problem with a large feature space. This problem involves arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and the rewards. It serves as a general framework for various decision-making scenarios, such as personalized online advertising and recommendation systems. Kernelized contextual bandits generalize the linear contextual bandit problem and offers a greater modeling flexibility. Existing methods, when applied to Gaussian kernels, yield a trivial bound of when we consider feature dimensions. To address this, we introduce stochastic assumptions on the context distribution and show that no-regret learning is achievable even when the number of dimensions grows up to the number of samples. Furthermore, we analyze lenient regret, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
