Theoretically-Grounded Policy Advice from Multiple Teachers in Reinforcement Learning Settings with Applications to Negative Transfer
Yusen Zhan, Haitham Bou Ammar, Matthew E. taylor

TL;DR
This paper introduces a formal framework for multiple teacher advice in reinforcement learning, providing theoretical guarantees and analyzing conditions under which negative transfer occurs, thereby advancing transfer learning understanding.
Contribution
It formalizes multi-teacher advice in RL, proposes an algorithm combining exploration and advice, and analyzes negative transfer with regret bounds.
Findings
Good teachers improve learning speed
Bad teachers can hinder learning
Negative transfer can be formally characterized
Abstract
Policy advice is a transfer learning method where a student agent is able to learn faster via advice from a teacher. However, both this and other reinforcement learning transfer methods have little theoretical analysis. This paper formally defines a setting where multiple teacher agents can provide advice to a student and introduces an algorithm to leverage both autonomous exploration and teacher's advice. Our regret bounds justify the intuition that good teachers help while bad teachers hurt. Using our formalization, we are also able to quantify, for the first time, when negative transfer can occur within such a reinforcement learning setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
