Laplacian Kernelized Bandit

Shuang Wu; Arash A. Amini

arXiv:2601.00461·cs.LG·January 5, 2026

Laplacian Kernelized Bandit

Shuang Wu, Arash A. Amini

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel multi-user kernel for bandit problems that combines graph Laplacian and base kernels, enabling effective exploration in non-linear, graph-structured reward settings with theoretical guarantees.

Contribution

It develops a unified multi-user RKHS kernel integrating graph Laplacian and base kernels, and designs algorithms with regret bounds that depend on an effective dimension, not user count.

Findings

01

Outperforms linear and non-graph baselines in non-linear settings.

02

Remains competitive when rewards are linear.

03

Provides a scalable regret bound based on effective dimension.

Abstract

We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint penalty for the collection of user reward functions ${f_{u}}$ , combining a graph smoothness term based on RKHS distances with an individual roughness penalty. Our central contribution is proving that this penalty is equivalent to the squared norm within a single, unified \emph{multi-user RKHS}. We explicitly derive its reproducing kernel, which elegantly fuses the graph Laplacian with the base arm kernel. This unification allows us to reframe the problem as learning a single ''lifted'' function, enabling the design of principled algorithms, \texttt{LK-GP-UCB} and \texttt{LK-GP-TS}, that leverage Gaussian Process posteriors over this new kernel for exploration. We provide high-probability regret bounds…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

- The paper tackles an under-explored but interesting problem. - The formulation of the penalty and its link to a RKHS is interesting and provides guidance on how relational information can be taken into account in learning. - Theoretical results on regret bounds are a plus. - The paper is generally well presented with clear motivation and technical descriptions.

Weaknesses

**Connection to existing literature.** I believe it would be helpful for the authors to make a greater effort connecting their approach to similar attempts in the literature: - First, the link between kernels and regularisations, as well as its extension to the graph case via the graph Laplacian, has been well-documented in the literature [1], and in my view this should be properly discussed in the derivation of the proposed approach. - Second, the way the graph structure is incorporated into le

Reviewer 02Rating 6Confidence 4

Strengths

1. Theorem 2.1 provides a clean theoretical connection between Laplacian regularization across users and RKHS-based regularization on arm features. This connection is elegant and makes the overall framework principled and well-founded. 2. The proposed GP-based UCB algorithm are grounded on solid theoretical foundations. The corresponding regret bounds based on the effective dimension are sharp and well-justified, making the theoretical contribution of the paper clear and convincing.

Weaknesses

1. The formulation mainly builds on previous work on linear Laplacian bandits by extending the idea from linear models to general RKHS functions. While the conceptual novelty is moderate, I think this extension is meaningful and technically non-trivial, as it requires re-deriving the RKHS characterization and associated regret analysis. 2. I am a bit confuse about the result for TS, as described below in detail.

Reviewer 03Rating 6Confidence 3

Strengths

Please see above.

Weaknesses

Please see above.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Advanced Graph Neural Networks