A General Framework for Clustering and Distribution Matching with Bandit Feedback
Recep Can Yavas, Yuqi Huang, Vincent Y. F. Tan, and Jonathan Scarlett

TL;DR
This paper introduces a comprehensive framework for clustering and distribution matching in bandit settings, providing theoretical bounds and an efficient algorithm that approaches the fundamental limits of arm pull efficiency.
Contribution
It develops a unified framework for various clustering and distribution matching problems with bandit feedback, deriving lower bounds and proposing an algorithm that asymptotically achieves these bounds.
Findings
Derived a non-asymptotic lower bound on arm pulls.
Proposed a computationally-efficient algorithm matching the lower bound asymptotically.
Uncovered a novel convergence rate bound as error probability decreases.
Abstract
We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a -armed bandit model where some subset of arms is partitioned into groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value~. Several existing problems fall under our general framework, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
