Robust Knowledge Transfer in Tiered Reinforcement Learning
Jiawei Huang, Niao He

TL;DR
This paper introduces a robust transfer learning framework for tiered reinforcement learning, enabling effective knowledge transfer between tasks with different dynamics and rewards, even with multiple source tasks, to improve learning efficiency and regret bounds.
Contribution
It proposes novel algorithms for robust knowledge transfer in tiered RL without assuming task similarity, and introduces a transfer source selection mechanism for multiple low-tier tasks.
Findings
Achieves constant regret on partial states under the Optimal Value Dominance condition.
Retains near-optimal regret for the high-tier task even with dissimilar tasks.
Enables benefits in larger state-action spaces with multiple low-tier sources.
Abstract
In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural dynamics and brain function · Neural Networks and Reservoir Computing · Adaptive Dynamic Programming Control
