Master-slave Deep Architecture for Top-K Multi-armed Bandits with   Non-linear Bandit Feedback and Diversity Constraints

Hanchi Huang; Li Shen; Deheng Ye; Wei Liu

arXiv:2308.12680·cs.LG·August 25, 2023

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

Hanchi Huang, Li Shen, Deheng Ye, Wei Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a master-slave neural architecture for top-K combinatorial multi-armed bandits with non-linear feedback and diversity constraints, achieving superior recommendation performance.

Contribution

It presents a novel master-slave framework with diverse slave models and co-training techniques to handle complex constrained bandit problems effectively.

Findings

01

Outperforms existing algorithms on synthetic datasets.

02

Achieves significant improvements in real recommendation tasks.

03

Effectively balances exploration and exploitation with neural contextual UCB.

Abstract

We propose a novel master-slave architecture to solve the top- $K$ combinatorial multi-armed bandits problem with non-linear bandit feedback and diversity constraints, which, to the best of our knowledge, is the first combinatorial bandits setting considering diversity constraints under bandit feedback. Specifically, to efficiently explore the combinatorial and constrained action space, we introduce six slave models with distinguished merits to generate diversified samples well balancing rewards and constraints as well as efficiency. Moreover, we propose teacher learning based optimization and the policy co-training technique to boost the performance of the multiple slave models. The master model then collects the elite samples provided by the slave models and selects the best sample estimated by a neural contextual UCB-based network to make a decision with a trade-off between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huanghanchi/master-slave-algorithm-for-top-k-bandits
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Machine Learning and Algorithms