Near-optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu; Xiaoyu Chen; Chi Jin; Lihong Li; Liwei Wang

arXiv:2102.04132·cs.LG·February 9, 2021·6 cites

Near-optimal Representation Learning for Linear Bandits and Linear RL

Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, Liwei Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces a sample-efficient algorithm for multi-task linear bandits and RL that leverages shared low-dimensional representations, significantly improving regret bounds and demonstrating near-optimality.

Contribution

The paper develops the first theoretical algorithm for multi-task representation learning in linear bandits and RL, achieving near-optimal regret bounds.

Findings

01

Achieves regret of (M\u221a{d}kT + dkMT) in multi-task linear bandits.

02

Provides a lower bound showing near-optimality when d > M.

03

Extends results to multi-task episodic RL with linear value functions.

Abstract

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation. We first consider the setting where we play $M$ linear bandits with dimension $d$ concurrently, and these bandits share a common $k$ -dimensional linear representation so that $k ≪ d$ and $k ≪ M$ . We propose a sample-efficient algorithm, MTLR-OFUL, which leverages the shared representation to achieve $\tilde{O} (M d k T + d k M T)$ regret, with $T$ being the number of total steps. Our regret significantly improves upon the baseline $\tilde{O} (M d T)$ achieved by solving each task independently. We further develop a lower bound that shows our regret is near-optimal when $d > M$ . Furthermore, we extend the algorithm and analysis to multi-task episodic RL with linear value function approximation under low inherent Bellman error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Near-Optimal Representation Learning for Linear Bandits and Linear RL· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms