DART: aDaptive Accept RejecT for non-linear top-K subset identification
Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek Umrawal

TL;DR
This paper introduces DART, a novel adaptive algorithm for non-linear top-K subset selection in bandit problems, capable of handling correlated rewards without linearity assumptions, and achieving near-optimal regret bounds.
Contribution
The paper presents DART, the first efficient algorithm for non-linear, correlated reward bandit problems that does not rely on individual arm feedback or linear reward models.
Findings
DART achieves a regret bound of (K\u221a{KNT}) that nearly matches the theoretical lower bound.
DART outperforms existing algorithms in cross-selling and reward maximization tasks.
The algorithm is computationally efficient with linear storage in N.
Abstract
We consider the bandit problem of selecting out of arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among options, making the action space large. To simplify the problem, existing works on combinatorial bandits {typically} assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top- subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
