Near-optimal Representation Learning for Linear Bandits and Linear RL
Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang

TL;DR
This paper introduces a sample-efficient algorithm for multi-task linear bandits and RL that leverages shared low-dimensional representations, significantly improving regret bounds and demonstrating near-optimality.
Contribution
The paper develops the first theoretical algorithm for multi-task representation learning in linear bandits and RL, achieving near-optimal regret bounds.
Findings
Achieves regret of (M\u221a{d}kT + dkMT) in multi-task linear bandits.
Provides a lower bound showing near-optimality when d > M.
Extends results to multi-task episodic RL with linear value functions.
Abstract
This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play linear bandits with dimension concurrently, and these bandits share a common -dimensional linear representation so that and . We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve regret, with being the number of total steps. Our regret significantly improves upon the baseline achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when . Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
