Transfer Learning for Contextual Multi-armed Bandits

Changxiao Cai; T. Tony Cai; Hongzhe Li

arXiv:2211.12612·stat.ML·January 26, 2024·1 cites

Transfer Learning for Contextual Multi-armed Bandits

Changxiao Cai, T. Tony Cai, Hongzhe Li

PDF

Open Access

TL;DR

This paper develops a transfer learning approach for nonparametric contextual multi-armed bandits under covariate shift, establishing minimax regret rates and proposing algorithms that leverage source domain data for improved learning.

Contribution

It introduces a novel transfer learning algorithm that attains minimax regret and adapts to unknown parameters, advancing nonparametric bandit learning under covariate shift.

Findings

01

The minimax rate of convergence for cumulative regret is established.

02

The proposed algorithms achieve near-optimal regret bounds.

03

Utilizing source domain data improves learning efficiency in the target domain.

Abstract

Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Bandit Algorithms Research