Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning

Andr\'as Antos (1); Andr\'as Millinghoffer (1; 2); P\'eter Antal (1; 2) ((1) Department of Artificial Intelligence; Systems Engineering; Budapest University of Technology; Economics; (2) E-Group ICT Software Zrt.; Budapest; Hungary)

arXiv:2512.24959·cs.LG·January 1, 2026

Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning

Andr\'as Antos (1), Andr\'as Millinghoffer (1, 2), P\'eter Antal (1, 2) ((1) Department of Artificial Intelligence, Systems Engineering, Budapest University of Technology, Economics, (2) E-Group ICT Software Zrt., Budapest, Hungary)

PDF

Open Access

TL;DR

This paper introduces a new semi-overlapping multi-bandit model and a generalized algorithm for efficiently identifying optimal candidates in complex, shared evaluation settings, with applications in various multi-agent learning scenarios.

Contribution

It proposes the semi-overlapping multi-bandit model and develops a generalized GapE algorithm with improved error bounds for support network learning.

Findings

01

Exponential error bounds scale linearly with overlap degree.

02

Shared evaluations significantly reduce sample complexity.

03

Theoretical guarantees improve upon existing multi-bandit best-arm identification methods.

Abstract

Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most beneficial candidates. Sequential approaches to these problems can be unified under a new framework, Sequential Support Network Learning (SSNL), in which the goal is to select the most beneficial candidate set of partners for all participants using trials; that is, to learn a directed graph that represents the highest-performing contributions. We demonstrate that a new pure-exploration model, the semi-overlapping multi-(multi-armed) bandit (SOMMAB), in which a single evaluation provides distinct feedback to multiple bandits due to structural overlap among their arms, can be used to learn a support network from sparse candidate lists efficiently. We develop a generalized GapE algorithm for SOMMABs and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks