Multi-task Representation Learning for Pure Exploration in Linear Bandits
Yihan Du, Longbo Huang, Wen Sun

TL;DR
This paper introduces novel multi-task representation learning algorithms for pure exploration in linear bandits, significantly reducing sample complexity by leveraging shared low-dimensional representations across tasks.
Contribution
It proposes the first algorithms demonstrating how shared representations can accelerate pure exploration in multi-task linear bandit problems.
Findings
Sample complexity is significantly improved over independent task solutions.
Algorithms DouExpDes and C-DouExpDes effectively plan optimal sample allocations.
First demonstration of representation learning benefits in multi-task pure exploration.
Abstract
Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research
