DART: aDaptive Accept RejecT for non-linear top-K subset identification

Mridul Agarwal; Vaneet Aggarwal; Christopher J. Quinn; Abhishek Umrawal

arXiv:2011.07687·cs.LG·February 16, 2026

DART: aDaptive Accept RejecT for non-linear top-K subset identification

Mridul Agarwal, Vaneet Aggarwal, Christopher J. Quinn, Abhishek Umrawal

PDF

Open Access

TL;DR

This paper introduces DART, a novel adaptive algorithm for non-linear top-K subset selection in bandit problems, capable of handling correlated rewards without linearity assumptions, and achieving near-optimal regret bounds.

Contribution

The paper presents DART, the first efficient algorithm for non-linear, correlated reward bandit problems that does not rely on individual arm feedback or linear reward models.

Findings

01

DART achieves a regret bound of (K\u221a{KNT}) that nearly matches the theoretical lower bound.

02

DART outperforms existing algorithms in cross-selling and reward maximization tasks.

03

The algorithm is computationally efficient with linear storage in N.

Abstract

We consider the bandit problem of selecting $K$ out of $N$ arms at each time step. The reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among $(K N)$ options, making the action space large. To simplify the problem, existing works on combinatorial bandits {typically} assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top- $K$ subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems