Towards Domain Adaptive Neural Contextual Bandits

Ziyan Wang; Xiaoming Huo; Hao Wang

arXiv:2406.09564·cs.LG·April 8, 2025

Towards Domain Adaptive Neural Contextual Bandits

Ziyan Wang, Xiaoming Huo, Hao Wang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel domain adaptation method for contextual bandits, enabling models to adapt across different domains with distribution shifts while maintaining low regret, supported by theoretical guarantees and empirical success.

Contribution

It presents the first general domain adaptation algorithm for contextual bandits, combining theoretical analysis with practical effectiveness.

Findings

01

Outperforms state-of-the-art algorithms on real-world datasets

02

Maintains sub-linear regret across domains

03

Effective adaptation with limited target domain feedback

Abstract

Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with distribution shift still remains a major challenge and largely unexplored. In this paper, we introduce the first general domain adaptation method for contextual bandits. Our approach learns a bandit model for the target domain by collecting feedback from the source domain. Our theoretical analysis shows that our algorithm maintains a sub-linear regret bound even adapting across domains. Empirical results show that our approach outperforms the state-of-the-art contextual bandit algorithms on…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1. The design of the new algorithm originates from an observation that leveraging data across domains leads to sub-linear regrets, which makes the whole method simple, yet elegant. 2. Theoretical proof has been provided to support the performance of the method. 3. The algorithm is extensively tested on three datasets.

Weaknesses

It would be helpful to elaborate on why the data divergence term is sub-linear in the proof.

Reviewer 02Rating 6Confidence 3

Strengths

(1) The proposed problem setting of contextual bandits in a domain adaptation scenario is interesting and challenging. (2) The method utilizes unlabeled data from both source and target domains for effective representation learning and alignment across different domains. (3) The algorithm is capable of attaining a sub-linear regret bound in the target domain by solving an online network lasso problem with time-dependent regularization.

Weaknesses

The method leverages unlabeled data from both source and target domains to learn robust representations and aligns them effectively across different domains, enabling efficient domain adaptation. While the proposed algorithm builds upon the NeuralLinUCB framework, it introduces an adaptation in the loss function specifically tailored for updating the neural network. This loss function integrates insights from classic domain adaptation techniques, and thereby has some similarity with existing met

Reviewer 03Rating 6Confidence 3

Strengths

- The authors provide a theoretical justification on the decomposition for the domain-transfer issue for contextual bandits. - The authors provide empirical evaluation demonstrating the performance of the algorithm.

Weaknesses

- (Minor) Definition 3.6 is hard to understand for me. In particular, are $\mathbf x$ and $x$ referring to the same thing? Are the authors implicitly assumes $h \in \mathcal H$ and $x \in \mathbb R^n$ in this case? It would be helpful if the authors could rephrase the mathematical formulation on this. - In Theorem 3.1 needs further justification. For example, I wonder why the sublinear in $R_S$ will lead to the sublinear result in $R_T$. I checked the proof for Theorem 3.1 in appendix but found

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research