Joint Representation Training in Sequential Tasks with Shared Structure

Aldo Pacchiano; Ofir Nachum; Nilseh Tripuraneni; Peter Bartlett

arXiv:2206.12441·cs.LG·June 28, 2022

Joint Representation Training in Sequential Tasks with Shared Structure

Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of joint representation training in multi-task reinforcement learning, demonstrating improved regret bounds and efficient algorithms leveraging shared low-dimensional structures.

Contribution

It introduces the Shared-MatrixRL algorithm for multitask RL with shared low-dimensional representations and proves regret bounds showing benefits over single-task approaches.

Findings

01

Regret bounds are improved from $O(PHd ext{ }\sqrt{NH})$ to $O((Hd ext{ }\sqrt{rP} + HP ext{ }\sqrt{rd}) ext{ }\sqrt{NH})$.

02

Shared low-dimensional representations lead to better learning efficiency in multitask RL.

03

Efficient algorithms are developed using quadratic programming reductions.

Abstract

Classical theory in reinforcement learning (RL) predominantly focuses on the single task setting, where an agent learns to solve a task through trial-and-error experience, given access to data only from that task. However, many recent empirical works have demonstrated the significant practical benefits of leveraging a joint representation trained across multiple, related tasks. In this work we theoretically analyze such a setting, formalizing the concept of task relatedness as a shared state-action representation that admits linear dynamics in all the tasks. We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL. In the presence of $P$ episodic tasks of dimension $d$ sharing a joint $r ≪ d$ low-dimensional representation, we show the regret on the the $P$ tasks can be improved from $O (P H d N H)$ to $O ((H d r P + H P r d) N H)$ over $N$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization