From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards
Liad Erez, Tomer Koren

TL;DR
This paper introduces a new algorithm for contextual combinatorial semi-bandits with sparse rewards, achieving improved sample complexity bounds and extending to list classification and adversarial settings.
Contribution
It provides a novel sample complexity bound for the $(,)$-PAC setting in sparse regimes, generalizes list multiclass classification, and extends regret bounds to adversarial data.
Findings
Sample complexity improves when sparsity s is much less than K.
Algorithm is computationally efficient with an ERM oracle.
Extends bounds to adversarial and list classification scenarios.
Abstract
We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size of a collection of possible actions. In each round, the learner observes the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the -sparse regime where we assume that the sum of rewards is bounded by some value . For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the -PAC variant of the problem for which we design an algorithm that returns an -optimal policy with high probability using a sample complexity of where is the underlying (finite) class and is the sparsity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training
