Joint Representation Training in Sequential Tasks with Shared Structure
Aldo Pacchiano, Ofir Nachum, Nilseh Tripuraneni, Peter Bartlett

TL;DR
This paper provides a theoretical analysis of joint representation training in multi-task reinforcement learning, demonstrating improved regret bounds and efficient algorithms leveraging shared low-dimensional structures.
Contribution
It introduces the Shared-MatrixRL algorithm for multitask RL with shared low-dimensional representations and proves regret bounds showing benefits over single-task approaches.
Findings
Regret bounds are improved from $O(PHd ext{ }\sqrt{NH})$ to $O((Hd ext{ }\sqrt{rP} + HP ext{ }\sqrt{rd}) ext{ }\sqrt{NH})$.
Shared low-dimensional representations lead to better learning efficiency in multitask RL.
Efficient algorithms are developed using quadratic programming reductions.
Abstract
Classical theory in reinforcement learning (RL) predominantly focuses on the single task setting, where an agent learns to solve a task through trial-and-error experience, given access to data only from that task. However, many recent empirical works have demonstrated the significant practical benefits of leveraging a joint representation trained across multiple, related tasks. In this work we theoretically analyze such a setting, formalizing the concept of task relatedness as a shared state-action representation that admits linear dynamics in all the tasks. We introduce the Shared-MatrixRL algorithm for the setting of Multitask MatrixRL. In the presence of episodic tasks of dimension sharing a joint low-dimensional representation, we show the regret on the the tasks can be improved from to over …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
