Towards Domain Adaptive Neural Contextual Bandits
Ziyan Wang, Xiaoming Huo, Hao Wang

TL;DR
This paper introduces a novel domain adaptation method for contextual bandits, enabling models to adapt across different domains with distribution shifts while maintaining low regret, supported by theoretical guarantees and empirical success.
Contribution
It presents the first general domain adaptation algorithm for contextual bandits, combining theoretical analysis with practical effectiveness.
Findings
Outperforms state-of-the-art algorithms on real-world datasets
Maintains sub-linear regret across domains
Effective adaptation with limited target domain feedback
Abstract
Contextual bandit algorithms are essential for solving real-world decision making problems. In practice, collecting a contextual bandit's feedback from different domains may involve different costs. For example, measuring drug reaction from mice (as a source domain) and humans (as a target domain). Unfortunately, adapting a contextual bandit algorithm from a source domain to a target domain with distribution shift still remains a major challenge and largely unexplored. In this paper, we introduce the first general domain adaptation method for contextual bandits. Our approach learns a bandit model for the target domain by collecting feedback from the source domain. Our theoretical analysis shows that our algorithm maintains a sub-linear regret bound even adapting across domains. Empirical results show that our approach outperforms the state-of-the-art contextual bandit algorithms on…
Peer Reviews
Decision·ICLR 2025 Poster
1. The design of the new algorithm originates from an observation that leveraging data across domains leads to sub-linear regrets, which makes the whole method simple, yet elegant. 2. Theoretical proof has been provided to support the performance of the method. 3. The algorithm is extensively tested on three datasets.
It would be helpful to elaborate on why the data divergence term is sub-linear in the proof.
(1) The proposed problem setting of contextual bandits in a domain adaptation scenario is interesting and challenging. (2) The method utilizes unlabeled data from both source and target domains for effective representation learning and alignment across different domains. (3) The algorithm is capable of attaining a sub-linear regret bound in the target domain by solving an online network lasso problem with time-dependent regularization.
The method leverages unlabeled data from both source and target domains to learn robust representations and aligns them effectively across different domains, enabling efficient domain adaptation. While the proposed algorithm builds upon the NeuralLinUCB framework, it introduces an adaptation in the loss function specifically tailored for updating the neural network. This loss function integrates insights from classic domain adaptation techniques, and thereby has some similarity with existing met
- The authors provide a theoretical justification on the decomposition for the domain-transfer issue for contextual bandits. - The authors provide empirical evaluation demonstrating the performance of the algorithm.
- (Minor) Definition 3.6 is hard to understand for me. In particular, are $\mathbf x$ and $x$ referring to the same thing? Are the authors implicitly assumes $h \in \mathcal H$ and $x \in \mathbb R^n$ in this case? It would be helpful if the authors could rephrase the mathematical formulation on this. - In Theorem 3.1 needs further justification. For example, I wonder why the sublinear in $R_S$ will lead to the sublinear result in $R_T$. I checked the proof for Theorem 3.1 in appendix but found
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research
