Co-Exploration and Co-Exploitation via Shared Structure in Multi-Task Bandits

Sumantrak Mukherjee; Serafima Lebedeva; Valentin Margraf; Jonas Hanselle; Kanta Yamaoka; Viktor Bengs; Stefan Konigorski; Eyke H\"ullermeier; Sebastian Josef Vollmer

arXiv:2512.12693·cs.LG·December 16, 2025

Co-Exploration and Co-Exploitation via Shared Structure in Multi-Task Bandits

Sumantrak Mukherjee, Serafima Lebedeva, Valentin Margraf, Jonas Hanselle, Kanta Yamaoka, Viktor Bengs, Stefan Konigorski, Eyke H\"ullermeier, Sebastian Josef Vollmer

PDF

Open Access

TL;DR

This paper introduces a Bayesian framework for multi-task bandits that leverages shared structure to improve exploration, especially under partial observations and complex latent dependencies.

Contribution

It presents a novel particle-based Bayesian approach that models joint task-reward distributions, enabling flexible discovery of inter-task and inter-arm dependencies.

Findings

01

Outperforms hierarchical bandit models in complex settings

02

Effectively handles model misspecification

03

Learns latent reward structures without prior assumptions

Abstract

We propose a novel Bayesian framework for efficient exploration in contextual multi-task multi-armed bandit settings, where the context is only observed partially and dependencies between reward distributions are induced by latent context variables. In order to exploit these structural dependencies, our approach integrates observations across all tasks and learns a global joint distribution, while still allowing personalised inference for new tasks. In this regard, we identify two key sources of epistemic uncertainty, namely structural uncertainty in the latent reward dependencies across arms and tasks, and user-specific uncertainty due to incomplete context and limited interaction history. To put our method into practice, we represent the joint distribution over tasks and rewards using a particle-based approximation of a log-density Gaussian process. This representation enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Explainable Artificial Intelligence (XAI) · Gaussian Processes and Bayesian Inference